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Chapter 1 
Introduction 



Information is embodied in physical objects. The paint on the ceihng of the Sistine 
chapel, the groove of a record, a single strand of DNA, and the spin of an individual 
electron each reflect a configuration of a particular physical system which governs 
the way in which it interacts with the rest of the world. Michaelangelo's painting 
interacts with ambient light, emitting a spectrum of colors viewed by churchgoers 
and tourists alike. The spinning record induces vibrations in a needle which are 
converted to a flow of electrons which, when amplified, cause pressure waves to travel 
through the air. The structure of DNA encodes instructions for both self-replication 
and the construction of living beings. Interactions between individual electrons are 
mediated by photons and can be modeled with great precision using the tools of 
quantum electrodynamics. The physics of simplified models can be calculated using 
the rules of quantum mechanics. Quantum particles embody quantum information. 

Claude Shannon, motivated by engineering problems in communication theory, 
initiated the study of information theory as an abstract discipline. On the first page 
of his seminal paper [Hj, he writes 

The fundamental problem of communication is that of reproducing at one 
point either exactly or approximately a message selected at another point. 
. . . the actual message is one selected from a set of possible messages. 

He later states "we wish to consider certain general problems involving communi- 
cation systems." These general problems, the outgrowth from which is referred to 
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nowadays as "Shannon theory," concern characterizing the possibihties of rehably 
transmitting certain "information sources" over "information-bearing channels." By 
making probabihstic assumptions regarding the behaviors of the sources and channels, 
a rich mathematical theory emerges which, in many cases, reasonably approximates 
the underlying physics. Specifically, Shannon showed that if a channel is modeled by 
a probability transition matrix p{y\x), its capacity for the transmission of classical 
information is given by 

C = maxI{X;Y). 

p{x) 

This formula will be discussed in Section 13.31 where we introduce the mutual infor- 
mation /(X; y), and in Section l^!T] where we discuss the proof of Shannon's theorem. 
Shannon theory applies to network communication as well. A probability transition 
matrix p{z\x,y) models a situation where two senders transmit to a single receiver, 
subject to noise and interference. The rates at which the senders can transmit in- 
dependent information were determined by Ahlswede and Liao [HJj to admit a 
single-letter characterization, given by the convex hull of the closure of the set of 
pairs of nonnegative rates (-Rx, Ry) satisfying 

Rx < I{X;Z\Y) 
Ry < I{Y-Z\X) 
Rx + Ry < I{XY;Z) 

for some p{x)p{y). Further analysis by Cover, El Gamal and Salehi [H] gives single- 
letter characterizations of a set of correlated sources which can be reliably transmitted 
over a multiple access channel, generalizing the above, as well as Slepian-Wolf source 
coding and cooperative multiple access channel capacity. They also give a multi-letter 
expression for the capacity region, showing that an i.i.d. source {U, V) can be reliably 
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transmitted if and only if 



H{U\V) 


< 


n 




H{V\U) 


< 


n 


|V"X" 


H{U,V) 


< 


n 





for some n and p{x"'\u^) , p{y'^\v'^) , where by x" we mean the sequence of symbols 
(xi, . . . ,Xn)- Here, H{U, V) and H{U\V) respectively denote the entropy and condi- 
tional entropy of the pair of random variables (f/, V) which model the source. Such 
a characterization is of limited practical use, however, as it does not apparently lead 
to a finite computation for deciding whether or not a source can be transmitted. 

The concept of the entropy of a physical system initially arose out of attempts 
to characterize the optimal efficiency of physical machines such as steam engines, as 
well as to rule out the possibility of such constructions as perpetual motion machines. 
The extensivity of entropy demands that it be additive for independent physical 
systems. Boltzmann defined the entropy of a physical system to be proportional to 
the logarithm of the number of microstates, or indistinguishable configurations of 
its constituents, a definition he was likely led to because log(iyiiy2) = log(Vri) + 
log(iy2); making additivity of entropy evident. In order to circumvent mathematical 
subtleties which arise due to course grainings of the system's configuration space, a 
probabilistic approach can be taken, allowing rigorous mathematical statements to be 
made about related systems which are essentially hidden Markov models [H]. A crucial 
philosophical step was taken by Boltzmann in his work; he assumed that things were 
made of atoms. In his framework, heat was not a fiuid that fiowed from warm to cold 
bodies; rather, vibrational energy of the constituents of a physical system induces 
similar behavior in neighboring systems. Without direct physical evidence to support 
the existence of atoms, Boltzmann provided a mechanism for the fiow of heat which 
assumed such ingredients did in fact exist. The existence of atoms was experimentally 
verified soon after Boltzmann's untimely death. 

In the years that followed, the structure of atoms was intensely investigated. The 
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assumption that atoms obey the laws of Newtonian mechanics quickly resulted in 
various logical inconsistencies in the form of predictions which did not agree with 
experimental results. Quantum mechanics was reluctantly developed as a collection 
of fundamental assumptions about the nature of the physics of atoms and their con- 
stituents. A collection of mathematical rules was thus constructed which allowed 
theoretical calculations of certain aspects of experimental results. A caveat was that 
the new theory introduced randomness as a fundamental assumption of the theory, a 
feature which was quite unsettling to even the creators of quantum mechanics, most 
notably Einstein, who thought that "God does not play dice with the universe." 

Today, we live in a quantum world. Progress in applied physics and engineering 
has begun to make manipulation of matter on the quantum scale a reality. It is 
a strange world, at least when viewed with a classical mind. From the other side 
of the fence, however, classical physics can be seen to be part of a quantum world. 
The emergence of classicality due to phase transitions in systems of many particles is 
one way this occurs. Mathematically, as we will see in Sections 12.1.71 and 12.2.31 the 
tools and language of quantum theory enable the expression of concepts from classical 
probability theory. 

This opens up the possibility of analyzing communication scenarios in which the 
senders and receivers process quantum information. In this case, the medium quite 
literally is the message, whereas rather than sending information by selecting a mes- 
sage from a set, physical systems are suitably prepared for transmission to a receiver. 
The possible types of quantum communication range from transmitting particles from 
sender to receiver to generating entanglement between the users of a channel. Quite 
remarkably, certain basic components from the classical theory find a place in the 
quantum extension. The techniques used to separate possible quantum information 
processing tasks from the impossible are directly analogous to those used Shannon's 
in original program. Possibility questions of this nature have much in common with 
the original motivations of thermodynamics. The ways in which entropy arises in 
characterizing the answers further deepens this connection. 

In this sense, one aspect of quantum information theory involves generalizing 
existing classical results to include quantum resources. While network Shannon theory 
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is already a rich active area of research, its quantum extension has new aspects which 
do not fit into the former framework. This leads to a theory which, while including 
the old one as a special case, asks new questions leading to a deeper understanding 
of the physical nature of information. Apparently, quantum information is something 
new which cannot be properly analyzed with classical tools alone. 

In this dissertation, we will analyze quantum channels with many senders and a 
single receiver, used in a variety of ways for the simultaneous transmission of classical 
and quantum information, representing an expanded version of the manuscript j53j . 
At a high level, the results and approaches contained within mirror those of classical 
Shannon theory. Yet, the mathematical tools utilized are distinctly quantum. Let us 
end this introduction by giving a quote from Asher Peres and Daniel Terno's paper 
P7j on quantum information and relativity, where it is written that "the goals of 
quantum information theory are the intersection of those of quantum mechanics and 
information theory, while its tools are the union of these two theories." Well said. 



Chapter 2 



Background 



2.1 The basics 



2.1.1 Quantum mechanics 

Let us briefly review some elements of quantum mechanics. The physical state of an 
isolated system with d quantum degrees of freedom is described by a complex unit 
vector lip) G C*. The notation {ip), known as a ket or ket vector, refers to a normalized 
column vector with d complex components: 

' , where la^P = 1- 

a 

\ad ) 

The conjugate transpose of is a row vector 



(t/'I is called a hra or hra vector. This notation (and nomenclature) was introduced 
by Dirac, partly to emphasize the inner product structure of C^. Indeed, the inner 
product between two state vectors |0) and \'\\)) is written as a bra times a ket, or 
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bra-ket 

It is often useful to write a basis for by defining a collection of kets as 



|1) 







|2) 



1 



, ... , 



\d) 







Then, a state such as {ip) can be expanded in terms of this basis as 

1^) = H h ad\d). 

A measurement can be performed on the quantum system, obtaining classical infor- 
mation regarding the system's current quantum state. Quantum mechanics is only 
able to predict the probabilities of occurrence for each outcome of the measurement. 
Further, the state of the system will generally be disturbed by the act of obtaining 
this classical inforamation. The simplest measurement to describe is a pure state 
measurement, which is completely described in terms of some orthogonal basis for 
C^. Such a basis will be called a measurement basis. Suppose that a pure state 
measurement in the measurement basis . . . , is made on the state Then, 

• The measurement will return y with probability 



p(y) = Prjmeasure \y)} = Kyltp)]"^. 



• If the measurement returns y, the post-measurement state is then \y). 

In other words, the measurement result is modeled by a 3^ = {!,..., c/j-valued random 
variable Y, distributed as p{y) = = \ay\'^, and the post-measurement state 

is a random vector \Y). If the same measurement is performed again, the same 
result Y is obtained with certainty, leaving the system in the same state \Y) after the 
measurement. 
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2.1.2 Pure state ensembles 

Here, let us fix a basis {\y)}y^i for C''. Imagine now a game with two parties, Alice 
and Bob. Assume that Alice has the ability to prepare any pure state from the finite 
collection of states {\ipx)}x£X- Then, the probability that Bob obtains measurement 
result y given that Alice prepares state \ipx) is given by 



We may interpret this as saying that if the 1-dimensional projection matrix \ipx){'^x\ 
is written in the {\y)} basis, then p{y\x) is equal to the diagonal matrix element 
corresponding to \y). 

Now, suppose that Alice gives Bob a random state, choosing {tp^) with proba- 
bility p{x). In this case, we say that Alice is preparing an ensemble {p{x), \ipx)} of 
pure states. Together with elementary probability, ()2.2j) can be used to write the 
probability that Bob measures y as 



p{y\x) = \ {y\'4^x)\^- 



(2.1) 



Notice that this can be rewritten as 



\{ym\' 



{y\^x){{y\^x)r 

{y\i^x){i^x\y) 

{y\{m{^x\)\y) 



(2.2) 



piy) = ^p{x)p{y\x) 



X 



^p{x){y\ \ipx){'4^x \ \y) 



X 




{y\p\y)- 
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where the third hne is by hnearity. The fourth hne defines the density matrix 

X 

of the ensemble {p{x), \ipx)}, which contains all the data required to compute all 
probabilities associated with any possible measurement on the ensemble, under the 
assumption that Bob doesn't know the identities of the individual states. Note that 
p is Hermitian 

X X 

and satisfies 

Trp = ^p{x)TT\tjjx){^x\ = J^p(x) = 1. 

X X 

p, as we've constructed it, is also nonnegative definite. This is because for any 
we have 

X X 

where the last inequality is because each term in the sum is nonnegative. 
2.1.3 Density matrices 

We have now seen that if a quantum system is prepared in a random pure state, one 
can write down its density matrix. This contains all of the data necessary to compute 
the probabilities of the outcomes of any measurement that can be made on that 
system, provided that the identities of the random pure states are unknown to the 
measurer. For a system in a pure state we will use the abbreviation tp = 
for the density matrix corresponding to that pure state (this is just the matrix which 
projects onto the subspace spanned by Let us define here the collection of all 
density matrices of a rf-level quantum system as 

P'^ = {pGC'^^'^: p = pt,p> 0,Trp = 1}. 
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In other words, a density matrix p G is a Hermitian, nonnegative definite normal- 
ized matrix. We give the following facts about without proof, as they are proven 
in detail in many texts on quantum mechanics I34[ IHHj: 

Property. is convex. 

Property. The extremal points of are the projections onto rank 1 suhspaces of 
C^, corresponding to equivalence classes of pure states which are identified up to a 
global phase factor e*^. 

Property. T>'^ is compact. 

We may interpret the first fact as saying that if with probability p, one chooses 
to prepare a quantum system so that its density matrix is p, while with probability 
1 — p, it is instead prepared so that its density matrix is a, someone who measures the 
resulting system (and is also ignorant about which preparation was made) computes 
measurement probabilities with the state pp + (1 — p)cr- 

The second fact illustrates the fact that every density matrix can arise from some 
pure state ensemble. This can be seen more directly, since the Hermiticity of p implies 
that it is diagonalizable as 



for some orthogonal basis {\i)} for C''. The positivity of p implies that Aj > 0, and 
the fact that p is normalized implies that the Aj may be interpreted as probabilities, 
implying the existence of the required pure state ensemble. Note that there is in fact 
an uncountable number of ways in which a density matrix can arise by probabilistically 
preparing pure states. 

More importantly, the fact that the extremal points of are pure states implies 
that pure states are special, in that they cannot arise as nontrivial probabilistic 
preparations of other states. A quantum system in a pure state is in a definite state. 
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2.1.4 Trace norm 

For an arbitrary M G C'^^'^, its trace norm \M\i is defined as 

|M|i = Tr VMMt. 

This is easily seen to be equal to the sum of the singular values of M. Indeed, writing 
a singular value decomposition M = UAV"^, it follows that 

d 

|M|i = TtVuavWaw = TtU^/a^u^ = J2m, 

i 

where A = diag(Ai, . . . , \d)- As | ■ |i is a norm (or rather, a unitarily invariant matrix 
norm), it satisfies the following properties: 

Property (Positivity). \M\i > 0, while \M\i = if and only if M = 0. 
Property (Homogeneity), for any c E C, \cM\i = |c||M|i 
Property (Unitary invariance). \M\i = \UMW\i for any unitary U 
Property (Triangle inequality). |M + A^|i < \M\i + \N\i 
Property (Submultiplicativity). |MA^|i < |M|i|A^i| 

Positivity follows because the singular values of any matrix M are always nonneg- 
ative, and are all equal to zero if and only if M = 0. Homogeneity is true because the 
singular values of cM equal |c| times those of M, and unitary invariance holds because 
UMU^ and M have the same singular values. For proofs of the triangle inequality 
and submultiplicativity, the reader is referred to |26j . 

The trace norm gives a natural metric space structure to C^^*^ which we will 
exploit considerably throughout this dissertation. Given two matrices M,N G C''^'^ 
their trace distance is thus defined as the trace norm of their difference \M — N\i. 
For two density matrices p and cr of a (i-level quantum system, their trace distance 
satisfies 

< |p-(t|i < 2, 
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where the lower bound is saturated if and only p = a, while the upper bound is 
saturated if and only if p and a are supported on orthogonal subspaces. Let us 
mention here the following alternative characterization of the trace distance between 
two density matrices |M] 



The maximization above is over all nonnegative definite matrices A with spectrum 
bounded above by 1. 



Given two density matrices p and cr of a ci-level system, their fidelity is defined ^ as 



Fidelity can be expressed in terms of the trace norm as 

a form which makes apparent the symmetry of fidelity in its two arguments. The 
following bounds are always satisfied whenever the arguments are density matrices 



The lower bound is saturated if and only if p and a have orthogonal support, while 
the upper bound is saturated if and only if p = cr. Contrary to the situation with 
the trace norm, a large value of the fidelity between two states signifies that they are 
close. Fidelity is not a norm, but it can be related to the trace norm in various ways 
which are summarized in Section lUTTl 

^Note that many authors (such as [31]) define this quantity as the square root of our definition. 



— a\i = 2 max Tr A(p — a). 

0<A<1 



2.1.5 Fidelity 




< F{p,a) < 1. 
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If one of the arguments of the fidehty is a pure state, (say p = 0), then 

F{\<P),a) = (Trv/|0)(0kl0)(0|)' 
= (Tr|0)(0|v^(^^)' 

So F{\(f)), a) is just the diagonal matrix element of a corresponding to when a is 
written in a basis including Note that this is the success probability for a pure 
state measurement which tests a system prepared in the state a for the presence of 
the state When both arguments are pure states, we obtain 

F(|0),|^)) = 1(01^)1^. 

Finally observe the following easily verifiable property. 

Property (Linearity of fidelity). Fidelity is linear in each argument, i.e. 

F{cp, a) = cF{p, a) = F{p, ca). 



2.1.6 POVMs 

We describe here a certain general type of measurement which can be performed on a 
(i-level quantum system, called a positive operator valued measurement (POVM). A 
POVM is specified in terms of a finite collection of matrices {A^; G 'C'^^'^}^,^^ which 
are positive (A^; > 0) and sum to the d x d identity matrix Id 

y 

It is often said that the matrices {Aajj^-eA" form a 'partition of unity. If the quantum 
system is in the state p, the probability of obtaining a measurement result y is given 
by 

p{y) = Pr{measure A^^} = Tr A^p. 
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Conditioned on having received the measurement result y, the post-measurement 
state after such a measurement is computed as 

Here, is defined as the unique, positive operator which satisfies a/Aa/A = A. ^ 
The measurement results in an ensemble of density matrices {p{y), Py}- A pure state 
measurement in the basis {|a;)} can be expressed as the POVM {|a;)(x|} consisting of 
1-dimensional projection matrices. 

2.1.7 Classical systems 

Let A" be a finite set and let X be an Af-valued random variable, distributed according 
to p{x). We can define a vector space C'"^' with a fixed orthonormal basis {\x)^}xex, 
labeled by elements of the set X. This sets up an identification | : X C''^' 
between the elements of X and that particular basis. By this correspondence, the 
probability mass function p[x) can be mapped to a density matrix 

P = ^P{^)\^){A (2-3) 

which is diagonal in the basis {|a;)}xeA'- Further, to every subset S X corresponds 

a projection matrix lis = ^xes\^)^^\ which commutes with p. In addition, the 

projections 11$ and IIt corresponding to any two subsets S,T <Z X commute. This 

way, we can express concepts from classical probability theory in the language of 

quantum probability. Consider the following translations from classical to quantum 

^Note that some authors use a more general kind of measurement, described by matrices {My} 
satisiying ^yM^My = Id- This amounts to choosing a different square root of each {Ay}, giving 
post-measurement states which are unitarily equivalent to those of the convention above, conditioned 
on the measurement result. Such measurements can be modeled using the tools introduced in 
Section EZHl 
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language: 



Pr{X e S} 
Pt{X i S} 



TrpHs 

1 - TrpHs = Trp(l 



X 



n^) 



Tr plls'c 



Pr{X G 5andX G T} 



Tr pn^n^ 



Pr{X e SoyX e T} 



Tr pP-snT 

1 -Pr{X i SandX ^ T} 
Trp(F-(l^-ns)(l^-n^)) 



TrpIIsuT- 



From the early development of quantum mechanics, noncommutativity has been seen 
to be the hallmark of quantum behavior. It is to be expected that classical probabil- 
ity, embedded in quantum theory's framework, is described entirely with commuting 
matrices. 

2.2 Composite quantum systems 

Let us begin by introducing a number of conventions which will be used when dealing 
with multiple quantum systems. We will use capital letters from the beginning of the 
alphabet A, B,C, . . . as labels for quantum systems. If A is a quantum system, we 
will abbreviate its level as \ A\ (which will always be finite), so that its pure states are 
unit vectors in C'"^'. A generic pure state of A will then be written as lip)^, while a 
generic density matrix of A will be written p^, to remind the reader to which system 
the state refers. Whenever we initially introduce a state, the superscript will identify 
the system it is describing, although later references to that state will not always 
include the superscript. This convention will not be cause for confusion, as different 
symbols will refer to different states. We will also write the |y4| x 1^41 identity matrix 
on Cl^l as 1^. 

If B is another quantum systems, then A and B may be combined to form a 
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composite quantum system AB. This new system has \A\ ■ \B\ = \AB\ levels. The 
pure states of the new system are instead unit vectors in the tensor product C'^' ® C'^' 
vector space of the individual vector spaces. The simplest way to define C'"^' ® C'^' is 
as follows. First, fix arbitrary bases {la)"^}!!!!! and for C'"^' and C'^L Then, 

C'"^' ® C'"^' can be formally defined as the linear span of the basis vectors formed by 
the product of the two individual bases 



1, 



\A\,\B\ 



A convenient shorthand for the tensor product of pure states is to write 



Then, any pure state of the quantum system can be written as 

\A\ \B\ 
a=l 6=1 



(2.4) 



Observe that this new vector space we have constructed has dimension ■ It is 
not difficult to show that this construction is universal, meaning that it is independent 
of the particular bases chosen for A and for B. 

It will be useful here to describe a certain convention which can be used to write 
down the tensor product of two column vectors as a single column vector. This will 
amount to fixing a way to enumerate the components of the tensor. Suppose that 
V E C'"^' and w G C'^' are arbitrary column vectors 



V2 



and w 



W2 



\V\A\ J \V\B\ ) 

As C'^^'c^'C''^' ^ C''^''''^', we can "fiatten" v®w into a single column vector, organizing 
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its components according to the following convention 



flatten : w 



V2W 



\VlA\Wj 

In this way, the earlier generic state ()2.4|) can be expressed as 



Cl|B| 
C2I 



flatten: |\E') 



AB 



C2\B\ 



\C\A\\B\J 



It is often the case that a pure state such as cannot be written as a tensor 

product of pure states of its constituent systems, i.e. 



•.B 



for any pure states \ip)^ and If this is the case, then 1^^)"^-^ is said to be 

entangled. Nevertheless, for any pure state of the composite quantum system, there 
exists a pair of ortho normal bases {|«)"^} and {|^)^} such that 



I*) 



AB 



B 



This form is called the Schmidt decomposition of l^')^'^. Together, the combination 
of the orthonormal bases is called the Schmidt basis, while the {di} are 

called the Schmidt coefficients. These are easily calculated from the singular value 
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decomposition of the matrix [qj] of coefficients in (j2.4|) . where the Schmidt basis 
consists of the left and right eigenvectors, while the Schmidt coefficients are the 
singular values themselves. 

Just as the tensor product builds larger vector spaces out of pairs of smaller ones, 
it also builds larger matrices from pairs of smaller ones. Fix two matrices M G C'*"'^'"^' 
and N e Cl^l^l-^L Recall that these are linear operators 

MiCl^l^d^l andiV:Cl^l^Cl^l. 



Their tensor product M ® is another linear operator 

(M ® iV) : Cl^l ® Cl^l ^ Cl^l ® Cl^l. 



We will abbreviate this by writing 



M:A^C, N:B^D and {M ® N): AB ^ CD. 



This new object acts on the tensor product of vectors as 



{M®N){\^)^® 



(M|^)^) ® (A^|0) 



and linearity defines the action of M ® on all of C'"^' ® C'"^'. The tensor product 
is also bilinear, i.e. for any c G C, 

c(M ®N) = {cM) ®N = M ® (cN). 



In the same vein as the "flattened" representation C'^l ®Cl^l ~ Cl^l'l^l for the tensor 
product of vectors, there is more general mapping Cl'^l^l*"' ® Cl-^l*^'^! ^ (CI-4M-B|x|chd| 
given by 



flatten : M ^ N 



( niiiN mi2N . . . mi|c|A^ ^ 
m2iN m22N 



\m\A\iN 



m\A\\c\N J 
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Note our convention, where the blocks are labelled by elements of the left-most compo- 
nent of the tensor product. We will use that convention throughout this dissertation. 
It is easy to see that calculations can be made in this representation, namely that 

fiatten{M ® iV}flatten{|?/;) |0)} = fiatten{(M ® N)\ip)\(f))}. 

As the composite system AB is a quantum system itself, it includes a (strictly) 
larger collection of von Neumann measurements and unitary evolutions. Indeed, given 
any two bases {|«)"^'^} and {l^')"^^} ^r C'"^' ® C'"^', they are related by a particular 
unitary matrix U, defined as 

U = J2\^'){^\. 

i'i 

It is not hard to see that any joint von Neumann measurement on the combined 
system AB can be performed using separate product measurements on A and B, 
provided that the unitary which takes intended measurement basis to the required 
product basis (and its inverse) are implementable. 

Of particular interest is the subject of local measurements on a composite quantum 
system. Suppose that a measurement {Arc}xex is made on the A part of the bipartite 
state p^^. New measurement operators {A^: ® l^}xex can be constructed, so that 

p(x) = Trp(A,®l^). 

The post-measurement states are given as before 

(v^® l^)p(v^®l^) 

Px = 7^ -■ 

p{x) 

It is instructive to see what happens if a local pure state measurement is made on 
part of a bipartite pure state |\E')"^^. Here, A^ = and we obtain 

p{x) = Trp(|x)(x| (g) 1^). 
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As a first step, express l"^)^^ in terms of the new basis for A as 

|M/)^^ = ^4,|a;)^|6)^ (2.5) 

xb 

Note that the new coefficients d^b are related to the old ones via 

a 

where U: {\a)} t— {\x)} is the unitary change of basis matrix. Then, 

xb 

X b 

X 

The third step above defines the unnormalized vector IV^x)^? where in the last, the 



normalization constant /S^ = y {i^x\^x) = and normalized state {ipx) = 

P~^\ipx) are defined. Now, it is a simple task to compute 

p{x) = Tr ® 1^)*^^ 

= (vl/|^^(|x)(x|®l^)|vl/)^^ 

x" x' 



x"x' 



Conditioned on having received the measurement result x, the post-measurement 
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(|a;)(a;| (g) l^)^^^(|x)(a;| O 1^) 

{\x){x\ ® l''){Ex"x'Px"f3:,\x"){x'\ ® |^.")(^x.'|) (|a:)(a;| ® 1^) 

|a;)(a;| ® \iJx){ipx\- 
Or rather, 

So, we see that a measurement on A causes the state of B to "collapse" as well. 
Rather, we see that the measurement on A creates a pure state ensemble {p{x), \ipx)^} 
on B. If an arbitrary POVM is performed on A, an ensemble of density matrices on 
B will generally result. To see this, we need to introduce the partial trace. 

2.2.1 Partial trace 

If we are instead concerned only with the measurement probabilities, and not with 
the post-measurement states, it is convenient to work with a density matrix on A to 
compute the measurement probabilities. This density matrix is defined in terms of 
the partial trace over B. Fixing a bipartite density matrix Q"^^ , the partial trace over 
B of can be defined as the unique density matrix Ttb^ on A such that for every 

IV M{Ttb ^) = Tr(M ® l^)fi. 

An equivalent way to define Ttb^ is as follows. If we write fla'ab'b = {(i''\{b'\^\b)\a) 
and {TTB^)a'a = {a'\{TTB^)\a), then 

(TlB^)a'a = ^aa'bb- 
b 

With this in hand, we can express 



state IS 



X 



AB 



Tr (A^ ® l^)n = Tr A^(TrBfi). 
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For any square matrix M on AB, the partial traces over A and B satisfy the following 
easily verifyable properties: 

Tr M = Tr 4 R M = Tr^ Trn M = Tr r Tr 4 M. 



A perhaps more concrete definition of the partial trace is obtained by writing a 
bipartite density matrix in the flattened representation 



AB 



\ 



\UJ\A\1 ^\A\\A\J 



where each Uaa' £ C'^'^'^L Then, Tr^ il^^ is obtained by summing the blocks on the 
diagonal 

TrAfi^^ = 5^Waa 

a 

and Tr^ by taking the trace of each block separately 

/ Trcuii ... Tru;i|A| \ 



AB 



\Ttujia\i Tra;|A||A|/ 
In fact, this representation will allow us to define the following partial product 

{a'\^l\a) = UJa'a-, 

allowing the partial trace over A to be expressed in the same was as with the usual 
trace 

Tr^n = ^ {a\Vl\a) = ^cUaa- 
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For the generic state vl/^-^ written in the form ()2.5p . let us compute 

X x"x' 

= j:\f^x\^" 

X 

2.2.2 Purifications and extensions 

Given an arbitrary density matrix p^, it is easy to construct a pure state l"^)^^ such 
that Tr^ = p. The state is called a purification of p. The construction is as 

follows. First, choose any pure state ensemble {p{x), I'lpx)^} giving rise to p^, in the 
sense that 

X 

Then, the state 

X 

is a purification of p^ . This is easy to see by computing the partial trace over A, 
which was done for a pure state of the same form in the last subsection. 

More generally we will speak of an extension fl^^ of a density matrix p^, which 
is just any density matrix (not necessarily a pure state) for which Tr^fi = p. It is 
easy to see that any purification |\|/)"^'^'^ of fl"^^ is a purification of p^ as well, since 

Tr^c ^ = TrslTrc ^) = Tr^ = p. 

to do: purifications! relate ensembles, purifications and measurements 

2.2.3 Classical-quantum (cq) systems 

Consider now a collection of density matrices {'^x}x£x^ indexed by the finite set X. 
If those states occur according to the probability mass function p{x), we may speak 
of an ensemble {p{x), of quantum states. In order to treat classical and quantum 
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probabilities in the same framework, a joint density matrix can be constructed 

a^^ = ^p(x)|a;)(x|''®a^ 

This is known as a cq state, and describes the classical and quantum aspects of the 
ensemble on the extended Hilbert space C'"^' ® C'"^' jlH]. The semiclassical nature of 
the ensemble is reflected in the embedding of a direct sum of Hilbert spaces C"^' 
into C'"^' CS>C'"^'. This should be compared with what was done in Section 1!^. 1.71 where 
a direct sum of one-dimensional vector spaces C was embedded into C'"^'. Just 

as the classical density matrix p from ()2.3|) was diagonal in a basis corresponding to el- 
ements of X, the cq density matrix a is block-diagonal, where the diagonal block corre- 
sponding to X contains the non- normalized density matrix p{x)ax. The classical state 
is recoverable as p = Tr^o", while the average quantum state is Tr^ o" = J^xex'^^- 
The classical-quantum formalism is not only of interest in its own right; information 
quantities evaluated on cq states play an important role in characterizing what is 
possible in quantum information theory. 

2.3 Dynamics 

We we have already seen an example of quantum dynamics; namely, the measurement 
process. In this section, we introduce the most general types of dynamical processes 
we will consider in this dissertation. The approach taken here will be to consider 
quantum channels whose inputs and/or outputs are classical-quantum systems. But 
first, let us review the notion of classical channels. 

2.3.1 Classical channels 

A discrete classical channel with input symbols belonging to a finite alphabet X and 
output symbols from a finite alphabet y is modeled by a collection of transition prob- 
abilities p{y\x). These probabilities comprise a stochastic matrix [p{y\x)]yx, because 
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the following two conditions are satisfied: 

p{y\x) > for each {x,y) E X x y 

and 

''^^p{y\x) = 1 for each x E X 
y 

ensuring that to each input symbol x, there corresponds a conditional probability 
mass function on the output symbols y. 

Given an A"- valued random variable X with probability mass function the 
action of the channel then defines another random variable Y , jointly distributed with 
X according to 

p{x,y) = p{x)p{y\x). 

Alternatively, we may view p{y\x) a linear map from the simplex of probability mass 
functions on X to the simplex of probability mass functions on 3^, via 

p{x) ^p{y) = '^p{x)p{y\x). 

X 

In this sense, a classical channel is a model for a device which allows a sender to 
"prepare probability mass functions" at the output. This way of looking at classical 
channels leads to our first "partial" quantum generalization, described in the next 
section. 

2.3.2 Classical quantum (c q) channels 

This generalization of classical channels consists of channels with a classical input and 
a quantum output. However, instead of preparing probability mass functions at the 
output, the sender prepares density matrices. A c ^ q channel X ^ B is specified 
by a collection of conditional density matrices {pl^}r^^x, labeled by the elements of 
a finite set X. As with classical channels, such maps extend to mappings from the 
simplex of probability mass functions on the input alphabet X to the density matrices 
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on the output quantum system B via 

X 

Such channels were imphcitly considered in Section 12.2.31 where we saw that if the 
input is modeled by a random variable X distributed according to p{x), the combined 
input-output is a cq system with cq state 

p^^ = ^p(a;)|x)(x|®pf. 

X 

The collection of c — ^ q channels with the same input set A! and output quantum 
system B has the structure of a compact convex set. Given two such channels with 
conditional density matrices {px}x&x and {cr^jxeA', if < A < 1, their corresponding 
convex combination has conditional density matrices {\px + (1 — \)cix}x&x- The ex- 
tremal points of this convex set have conditional density matrices which are extremal 
in the convex set of density matrices on B. In other words, the extremal points consist 
of channels which prepare pure states. This fact will be important when we discuss 
classical capacities of quantum channels in Section W?]\ 

2.3.3 Unitary quantum channels 

The simplest quantum channel is a unitary transformation. For a closed quantum 
system A, this is the kind of evolution predicted by the Schrodinger equation 

We will write 

U: A^ A 

to reflect the fact that U G Cl^l'^l^l is a square matrix mapping 
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In this thesis, we will be exploring the consequences for processing quantum informa- 
tion which result from the ability to cause any unitary evolution to occur to a given 
quantum system. Ensuring that a quantum system undergoes a particular unitary 
evolution is generally a difficult engineering task, since it involves influencing the sys- 
tem in just the right way, from the outside, so as to inhibit its natural tendency to 
evolve in the way that it would have without any influence. To say that this will be 
of no concern to us here would be somewhat untrue. In fact, the central goal of this 
thesis is to show that, under that assumption that error-free processing of quantum 
information is possible, one can in fact protect and correct quantum information from 
this natural tendency to interfere with other quantum information and with the envi- 
ronment. Indeed, we will assume that it is possible to process quantum information 
fault tolerantly. If the state of A is specified by a density matrix p^, the unitary 
channel acts as 

U: p^^ p'^ = UpUl 

In other words, p transforms according to the adjoint map associated to U. We will 
frequently abbreviate this map as 

U{p) = UpUl 

It will often be useful for us to speak of unitaries between quantum systems. For 
example, we may think of a quantum system A at some time t, being turned into 
another quantum system i? at a later time t', where |y4| = \B\. If this process acts 
unitarily, we will write 

U: A^ B 

for the associated unitary channel. As an example, consider a physical scenario in 
which an electron placed at a position x at time t is transferred to some other position 
x' by some later time t + T, after having been rotated by 180° about its z axis. The 
quantum system A thus represents the original preparation of the electron at x, while 
B represents the evolved electron, T seconds later, with its new state at position x'. 
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2.3.4 Quantum channels 

Quantum channels represent a physical process which transfers quantum states for- 
ward in time. The state at the output of the channel will be some noisy version of 
what was put in. Examples include an optical fiber over which the polarization of an 
input photon may become corrupted by noise, or a quantum dot which will hold a 
single electron for an uncertain amount of time. 

Here, we will give a precise mathematical definition of quantum channels as func- 
tions from the density matrices of an input quantum system to the density matrices 
of an output quantum system, generalizing the notion of discrete memoryless classi- 
cal channels described in Section 12.3.11 which map probability mass functions on the 
input alphabet to probability mass functions on the output. The mathematical prop- 
erties which we require a channel to satisfy are from the standard literature on open 
quantum systems and quantum information theory, so much of the content here is 
presented without proof. Some standard references for this material include [SHEHl- 

By a quantum channel M : A ^ B, we mean a mathematical object which maps 
density matrices on A to density matrices on B, while satisfying the following three 
physically motivated properties described below. 

Property (Linearity). 

TV". £\A\x\A\ _^ (£\B\x\B\ 

is a linear map, so that 

i i 

Property (Trace preservation). A/" preserves the trace of the input density operator 

Trp = Tr7V(p). 

This technical requirement will sometimes be relaxed to the requirement that J\f 
only be trace-non-increasing 

Trp > TiAfip). 

With a slight loss in pedantry, we will generally refer to such maps as trace-reducing. 
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In such a case, Af can be interpreted as a channel which is executed with some 
probabihty less than one. To introduce the third property, let us show that there is 
a unique way in which A/" acts on the A part of a composite quantum system AC. It 
is sufficient to see what happens when acting upon part of a pure state 

ac 

Here, we obtain 

(Ar®l^)(*) = (Ar®l^)( ^ 4v<el«')(«l® |c')(c|) 

a'ac'c 

= 5^4v<eA/'(l«')(a|)®|c')(c|. 

a'ac'c 

The action of A/" ® 1*^ is then uniquely defined on any density matrix by ffist 
writing any pure state decomposition 

i 

Then by linearity, 

(Ar®l^)H = 5^p,(Ar®l^)(vl/,). 

i 

Now, we can mention the third characteristic property of a quantum channel. 

Property (Complete positivity). The channel must be completely positive, mean- 
ing that not only mustM : A ^ B take nonnegative definite matrices on A to nonneg- 
ative definite matrices on B, hut for any C it must take nonnegative definite matrices 
on AC to nonnegative definite matrices on BC . 

A physically satisfying consequence of these three properties is that if a quantum 
channel acts on part of a convex combination of density matrices, the resulting op- 
erator will be a density matrix. Quantum channels also obey the following locality 
properties, which can be derived from the above three. We will later invoke these 
(quite frequently) without reference. 
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Property (Locality I). Given a bipartite density matrix p"^^ and two quantum 
channels 

Af:A^CandM:B^D, 
the actions of N and M. commute with one another, i.e. 

{M®l'')o{l^®M) = (l^®M)o(Ar® 1^) = X®M. 

These equations are summarized by the leftmost commutative diagram below. The 
rightmost diagram is to remind the reader of the subsystems on which the correspond- 
ing states are defined. 




Property (Locality II). Given a bipartite density matrix p"^^ , a local operation 
on B will not affect the reduced density matrix on A, i.e. given a quantum channel 
J\f: B ^ C , we have 

Trc(l^®Ar)(p) = TrBp. 

This is summarized by the commutative diagram on the left below. On the right, we 
remind the reader of the subsystems involved. 




This last property can be paraphrased as stating that Tr^ p^^ is independent of 
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any physical process which is carried out on B. Throughout this dissertation, we 
will often omit identity maps in expressions such as 1"^ ®Af, so that Af: B ^ C will 
be interpreted as the map A/": AB AC whenever necessary. An advantage of this 
approach is that it allows long expressions to be simplified. This leaves no room for 
ambiguity, as the action of a channel on part of a larger system is always uniquely 
defined. 

2.3.5 Representing quantum channels 

In this section, we review two useful representation theorems for quantum channels. 
The first, due to Stinespring, shows how unitary processes can give rise to quantum 
channels. The second, due to Kraus, shows how quantum channels can be viewed as 
measuring devices which "forget", or "keep secret", the measurement result. 

Suppose that a quantum system A is prepared and allowed to evolve unitarily 
with some extra system E which is promised to be prepared in some known pure 
state according to a unitary U: AE —* AE. Since the state of E is guaranteed 
to be in the same state before the application of U, some of the elements of U are 
irrelevant to the dynamics. For example, fixing bases and {|e)'^}, suppose that 

U is given by 

ae 

for some other orthogonal basis {{(f^ae)"^^} of the combined system AE. Then, an 
arbitrary pure state of A 

will be mapped to 

f/|0)^|l)^ 



^|0..r(ar(er(5:a.,|ar)|l)^ 

ae a' 

^aa'{a\a){e\l)\(f)aey 



y^«a|0al) 



AE 
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Thus, only the first \A\ columns of U are relevant to this situation. Keeping only 
this "chunk" of the unitary U defines an isometry U: A ^ AE. Mathematically, 
V : A ^ i? is an isometry if and only if it satisfies one (and thus both) of the following 
conditions: 

o V = l^andVo = Ua- 

Above, is a projection matrix on B satisfying Trll^ = \A\. In other words, an 
isometry is a length-preserving matrix whose range is a subspace of the target space, 
giving an image of the input space on the output space. 

Returning to the isometry lA: A AE, consider what will happen if the extra 
system is disregarded. Given a density matrix p"^, a mapping Tte^A = Af: A —>■ A 
results. This map A/" is a quantum channel, and the map U will be called an isometric 
extension of A/". We will generally use a subscript to identify the channel which is 
being extended, saying that U^f isometrically extends M. This way of representing a 
quantum channel is often referred to as the Stinespring representation, and we will 
use it almost exclusively throughout this dissertation. To be precise, we will often 
invoke the following proposition. 

Proposition (Isometric extension representation). A map M \ A ^ B is a 

quantum channel if and only if there exists an isometric extension lAj^ : A BE of 
M. 

Remark. In general, an isometric extension Uj^ of M is not unique. This can be 
seen by defining Wj^ = V o Wv, where V: E ^ E' is any isometry into a (potentially 
different) environment E'. Since TteI^jv = Tr^;/ VoU^, these extend the same channel 
AT. 

Another way to represent a quantum channel is due to Kraus, and is called the 
operator sum representation (OSR). The following proposition was first proved in [?]. 

Proposition (Operator sum representation). A map M : A ^ B is a quantum 
channel if and only if it can he written as 

k 
i=l 
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for matrices {Ni E C'^'^'"^'} which satisfy 

k 
i=l 

The matrices {Ni} are called the operator sum matrices (OSR matrices) of the 
representation. Such a representation of A/" is generally not unique. It should be 
mentioned that this representation bears a strong resemblance to the measurement 
model of POVMs given in Section Ti. 1.61 

For a given channel, the two representations given above are intimately related, 
and having at hand one representation immediately gives the other as follows. If 
the action of A/" can be expressed in terms of OSR matrices {Ni}'^^^, an isometric 
extension Uj\f: A BE into an environment of size \E\ = k can be constructed as 

k 

1=1 

This is perhaps easier expressed by writing lAj^ as a block matrix (in the flattened 
representation), with blocks given by the OSR matrices as 




\Nk) 

Note that the dimensions match up; namely, lAf^ E c\e\-\b\>^\^\ _ xhe reverse is also 
true, and the construction just involves identifying the OSR matrices with the corre- 
sponding blocks of a given isometric extension Uj^. 

Remark. As the nonuniqueness of isometric extensions is due to the isometric free- 
dom in describing the environment, the operator sum representation inherits this 
freedom as well. 
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2.3.6 Complementary channels 

Suppose that a channel Af: A ^ B is given. Fixing an isometric extension Uj\f: A 
BE of Af, define the channel Af" : A ^ E via Af" = Ti bUj^. We will say that the 
channel Af^ is complementary to Af. If the channel acts on a density matrix p^, the 
state Af^{p) on E can be thought of as the disturbance induced into an initially pure 
environment by the action of the channel. 

Remark. While the choice of complementary channel is generally not unique, it is 
unique up to isometrics on E, inheriting this freedom from the choice of isometric 
extension. 

2.3.7 Controlled quantum channels and cq ^ q channels 

Consider a collection of quantum channels {A4x- A B}x<^x, labeled by a finite set 
X. Introducing a controlling classical system X, available at the input and output, 
the collection of channels can be represented by a controlled channel AA. : XA XB. 
This channel acts on a cq state 

as 

If the controlling system X is not available at the output, the action of the channel 
is modified to 

AA'ia) = TixAdia) = 5^p(x)A^,K). 

xf^X 

We will show next that for any quantum channel Af: XB C which is only 
intended to act on cq states, less data is required to specify the action of the channel. 
In such a case, the channel can be represented in the same fashion as Ai', in the sense 
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that the action of A/" on a^^ decomposes as 

for some channels {A^: A B^^ex- To see this, suppose that M: XB C has an 
operator sum decomposition 

d 
i=l 

where the \C\ y<\X\ - |i?|-dimensional matrices A^^, satisfy XliLi ^I^i = 1"^^. Consider 
each Ni to be composed of jA"! blocks of size \C\ x \B\, as 

N^ = [Na N,2 ■■■ iV,|^|) . 

The action of A/" on a then simplifies as 

d 

1=1 
d 

1=1 x£X 

d 

xex i=i 

where in the last step we identify, for each x, the matrices {Nix}x&x as the components 
of a trace preserving map A4- 
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2.3.8 Quantum instruments (q cq channels) 

A quantum instrument Af: A — * BX is a quantum channel whose output is a 
cq system. Mathematically, it is specified by collection of completely positive, trace- 
reducing channels {A/^. : A B}^^^, labeled by a finite set X, such that the sum 
Af = Ylix-^xi which acts on an arbitrary input state p"^ as 

A/-(p) = 5^A4(p), 

X 

is trace preserving (and is thus a quantum channel). The action of the instrument on 
is given by 

X 

The measurement process can be modeled by a quantum instrument as follows. Given 
a POVM {Ax}x&Xi consider the quantum instrument A/": A AX with components 
acting as 

A4-(p) = v^PV^- 



Then, the action of Af is just 

X 

= Tr A^p|x) (x| ® 



Tr k^p 

= y^,p{x)\x){x\ ^ px, 

X 

where the {px} are the post-measurement states. In other words, Mx{p) is an unnor- 
malized density matrix satisfying 



Tr A4(p) = p{x) 
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which is proportional to the post-measurement state. We will later utilize such a 
quantum instrument in order to simultaneously decode classical and quantum infor- 
mation which have been transmitted over a quantum multiple access channel. 

It is also possible to use an instrument to model a measurement which ignores 
the post-measurement state. This is done with a measuring instrument A4: X, 
which can be defined in terms of the previous instrument as Try^A/*. This simpler 
instrument acts as 

M{p) = Tr^^|x)(x| ® v^PV^ 

X 

= ^(Tr^A,.)|a;)(x| 

X 

= ^p{x)\x){x\, 

X 

which is exactly as one would expect a measuring device to act. 

As an instrument is also a channel, it makes sense to speak of an isometric exten- 
sion and complementary channel to an instrument. In the appendix f Section 
we will demonstrate that any channel complementary to an instrument is another 
instrument with similar structure. Namely, the components of the complementary 
instrument are obtained as complements of the components of the original instru- 
ment. 
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Entropy and information quantities 

In this chapter, we review the notion of quantum entropy, as well as some related 
information theoretical quantities which characterize the capacities to be introduced 
later. 

3.1 Entropy 

Let A* be a finite set, and let X be a A'-valued random variable, distributed according 
to p{x). The Shannon entropy of X is defined as 

H{X) = - ^p(x) \ogp{x). 

X 

All logarithms in this dissertation will be to the base 2 (log = logg). Also, note that 
we will always take OlogO = 0, as linij-^o 3^ log x = by continuity. Further note 
that H{-) does not depend on the values taken by X. Rather, it is a functional of the 
probability mass function p{x) of X. Indeed, X is merely abstract set whose elements 
are merely labels for events. For example, X may be taken to represent the result of 
a fair coin flip, whereby X = {heads, tails} and p(heads) = p(tails) = |. In this case, 
H{X) = 1 bit. One interpretation to be gained from this example is that we obtain 
a bit of information by learning the result of a fair coin flip. In this sense, the coin 
flip example defines a "unit of information" equal to 1 bit. 
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H{X) can also be interpreted as the number of bits, on average, required to rep- 
resent the random variable X. Intuitively, entropy may be thought of as a measure of 
the amount of "information contained in" the random variable X. By definition, this 
is a statement concerning the asymptotic statistics of sequences of i.i.d. random vari- 
ables X" = {Xi, . . . Such an operational definition has its roots in the source 
coding theorem, which dates to Shannon's original paper where the entropy was 
established as the fundamental limit on the compressibility of information. As this 
dissertation will focus on the closely related problem of channel coding, we will not 
pursue this interpretation further. 

Suppose that a quantum system is prepared with density matrix p. We define the 
von Neumann entropy of p as 

H{p) = — Trplogp. 

Note that we overload the letter H to mean both Shannon and von Neumann entropy. 
Writing an eigendecomposition of p as 



we obtain an ensemble of orthogonal pure states {p{x), \x)} which also gives rise to 
the density matrix p. The von Neumann entropy of p is then equal the Shannon 
entropy of the eigenvalues of p. Indeed, 




X 




X X 




X 



^p{x)\ogp{x) = H{X) 



X 



where X is a random variable with probability mass function p{x). 

If p^ is associated with system A, we will often write H{p) = H{A)p, omitting 
the subscript when the state is apparent from the context. Given some multipartite 
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state fl^^ , the above notation gives a useful way to denote entropies of partial traces 
of n. For example, H{A)n = H{Tib^), while H{AB)n = H{n). We now state the 
following elementary properties of entropy. These are proved in many introductory 
textbooks such as [Mj . 

Property (Entropy is nonnegative) . 

H{A) > 

This bound is saturated if and only if A is in a pure state Icp)^. 
Property (Entropy is bounded). 

H{A) < log \A\ 

This bound is saturated if and only if A is prepared in a maximally mixed state 




Property (Entropy is subadditive). 

H{AB) < H{A) + H{B) 

This bound is saturated if and only if AB is prepared in a product state ® cr 
Property (Lieb's inequality). 

\H{A)-H{B)\ < H{AB) 

Let us compute the entropy of a generic cq state 



p''^ = 5^p(a;)k)(x|®p^. 

X 



(3.1) 



CHAPTER 3. ENTROPY AND INFORMATION QUANTITIES 



41 



To do so, we first diagonalize each as 

px = ^Px{y)\yx){yx\, (3.2) 



where for each x, the vectors _^ form (generally) different orthonormal bases 

for A. Then, we write 



X X 

= -Tr^p{x)\x){x\ ® (^px log {p{x)p, 

X 



X\ ® Pc 



- ^P{x)Px{y) \og{p{x)p^{y)) 

xy 

-^P{x){\0gp{x) + ^Py{x) \OgPy{ 

X y 
HiX) + J2pi^)H{Px). 



Together with subadditivity, the calculation of the joint entropy of a cq state 
allows a simple proof of the convexity of entropy 



Property (Convexity of entropy). 

Pxpx 

X X 

Proof. Consider the cq state p^^ from (jH.lj) . Beginning with subadditivity, we have 
i/(X) + J]p(x)i/(p.) = H{XA), 

X 

< H{X) + H{A) 

= i/(x) + i7(5^p,.(p.; 
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Subtracting H{X) from each side completes the argument. 




Proof. The eigenvalues of p and of V(p) are the same. 



3.2 Conditional entropy 



Let us begin by making the following formal definition for the conditional entropy 



By the calculation of H[XA) for the cq state (j3.1|) from the previous section, 



Observe that H{A\X) is equal to the the average entropy of A, averaged over the 
classical part of the cq state. In classical information theory, conditional entropy is 
often defined as 



It interesting to note that if we start with the cq state p^"^ from (|3.ip . we may define 
a random variable Y which is jointly distributed with X in accordance with the 
conditional distribution p{y\x) = Px{y), using the notation from ()3.2|) . The equality 
H{A\X = x) = H{Y\X = x) holds, and thus H{A\X) = H{Y\X) holds as well. 

However, for an arbitrary state on AB, this interpretation of H{A\B) as an average 
entropy is not valid. In particular, suppose that \A\ = \B\ = 2, and that AB is in a 
pure state 



H{A\B) = H{AB) - H{B). 



H{A\X) = Y,v{x)H{p,). 



X 



H{Y\X) = - y) \ogp{y\x). 



xy 
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Since Tr^ \1/ = vr , it follows that for this state, 

H{A\B) = H{^) - H{tt^) = - 1 = -1. 



Defined in this formal way, conditional entropy can in fact be negative! As we will 
see in Sections 13.41 and l4.3| the negative of H{A\B), referred to as the coherent infor- 
mation, plays a role in characterizing the quantum capacity of a quantum channel. 

Let us conclude our discussion by noting the following property of conditional 
entropy. A proof can be found in 

Property. H{A\B)p is concave as a function of p^^ . 



3.3 Mutual Information 

Given two random variables X and Y, jointly distributed according to p{x,y), the 
mutual information I{X; Y) measures the amount of correlation between the two 
random variables. I{X; Y) is typically defined as an expected log likelihood ratio 



/(X;F) = 5^p(x,i/)log 



xy 



p{x,y) 
p{x)p{y)' 



Simple algebraic manipulations yield the following alternative formulas for I{X; Y). 

J(X;F) = H{X) + H{Y)-H{XY) 
= H{X)- H{X\Y) 
= H{Y)-H{Y\X). 

Given a stochastic matrix p{y\x) of conditional probabilities, further denotation of an 
input distribution p{x) determines a joint distribution p{x, y) for the random variables 
X and Y . In Section 14.11 we will see that the capacity of a classical channel with 
transition matrix p{y\x) is given by the expression 



C = maxI{X;Y). 

p{x) 
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A similar expression can be given for the capacity of a c — > q channel, in terms of the 
quantum mutual information evaluated on cq states. 

Rather than define quantum mutual information in terms of a log-likelihood ratio, 
we opt here to give the following algebraic definition, valid for any composite quantum 
system AB. 

I{A; B) = H{A) + H{B) - H{AB). 

Using the formal definition of conditional quantum entropy from the previous section, 
we could have equivalently defined I{A] B) as 

IiA;B) = H{A)-H{A\B) 

or as 

IiA;B) = H{B)- H{B\A). 

Most relevant to this dissertation is the evaluation of mutual information on a cq 
state such as 

p^s = J]p(a;)|x)(a;|^®pf. 

X 

With respect to p^^, let us evaluate 

I{X;B) = H{B)-H{B\X) 

\ X / X 

Together with the cq channel X ^ B defined by the conditional density matrices 
{Px}i the cq state p-^^ represents the joint distribution on the input and output of 
the channel, serving the same purpose that p{x,y) = p{x)p{y\x) did in the purely 
classical case. In fact, an analogous capacity formula is obtainable as well 

C = maxI{X;B). 

p{x) 
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This capacity is easily computable, as a consequence of the first of the following two 
convexity properties enjoyed by I{X; B). 

Property. For a fixed cq channel X ^ B defined by the conditional density matrices 
{pf}, I{X;B) is a concave function ofp{x). 

Proof. As the is linear in p{x), and H{B) is concave in , H{B) is concave in 
p{x). But H{B\X) is linear in completing the argument. ■ 

Property. For a fixed input distribution p{x) , I{X; B) is a convex function of the cq 
channel X ^ B. 

Proof. This follows because H{X\B) is a concave function of p^^ , which is itself 
linear in the conditional density matrices {pf }. ■ 

For an arbitrary quantum channel M : A B, specification of a collection of input 
states {p^}, or equivalently, of a cq channel with those conditional density matrices, 
yields a new cq channel X ^ B with conditional density matrices {J^{px)}- This 
channel is mathematically equivalent to the composed actions of the cq and quantum 
channels. By the discussion above, optimization over input distributions p{x) then 
gives the classical capacity of the newly constructed cq channel. However, the ultimate 
capacity of the quantum channel involves an optimization over collection of input 
states. Concavity of quantum mutual information in the input ensemble implies that 
extremal ensembles maximize capacity; such are ensembles of pure states. However, 
whether or not a single-letter converse can be obtained in this case remains a very 
important open problem in quantum information theory. As a result, the best known 
characterization of the capacity of a quantum channel for the transmission of classical 
information is 



where for each k, the maximization is over all pure state ensembles {p{x), \(t>x)^ } 
consisting of lA"! < min{|y4|, — 1 states. The mutual information is evaluated 

with respect to the corresponding cq states 



C{^^) = lim -max/c(X;5'=)^ 



fc^oo k XA'' 



a 



XB'' 



Y,v{x)\x){x\®N'^\<i)f). 



X 
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A state such as cr will be said to arise from the channels TV®'' in the above sense. 

3.4 Coherent Information 

Suppose a channel N": A' ^ B is given. Fix an isometric extension A' BE, 
and let A/""^ = TibI^n the associated complementary channel. For a given input 
density operator the coherent information is defined as 

h{pM) = H{U{p))-H{U\p)). 

Since any two complementary channels are equivalent up to an isometry on E, and 
since isometrics preserve entropy, this quantity is independent of the particular com- 
plementary channel M'^ chosen for the calculation. H{M'^{p)) is frequently referred 
to as the entropy exchange associated with sending a system with density matrix p 
over the channel M . 

Coherent information can be used to characterize the capacity of a quantum chan- 
nel for transmitting quantum information as 

Q{N) = lim ^maxJ,(p,Ar®^). 

fc^OO k pA'k 

In Section 14.31 we will give an operational definition of quantum capacity, as well a 
discussion of the proof of this capacity formula. It should be noted that this multi- 
letter characterization is the most general expression known for an arbitrary quantum 
channel. However, as we illustrate in Sections 19. 2l and f9.3t there are classes of channels 
for which a single-letter expression suffices. 

Let us now explore other ways of writing Ic{p,J\f). With respect to the joint 
output-environment state Llj\f{p) on BE, observe that 

H{U{p)) = H{B) and H{J\r\p)) = H{E). 



Then, 



h{pM)=H{B)-H{E). 
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It is possible to write this quantity without making exphcit mention of the environ- 
ment. To do this, first fix any purification of p^' . Then, use this to write a 
global pure state 

Since \fl)^^^ is pure, it follows that H{E) = H{AB). This allows us to rewrite 

H{B) - H{E) = H{B) - H{AB) = -H{A\B). 

Remark. Written this way, it is clear that coherent information can be positive or 
negative. However, Q{M) > for every channel A/", as Ic{\4>)^\N') = for every pure 
state 10)^'. 

Observe that since any two purifications of p^' are the same up to local unitaries 
on A, and such unitaries preserve H{AB), this last expression is independent of the 
particular purification |\E')'^"^' chosen for p"^'. Further note that to compute —H(A\B), 
it suffices to consider the joint state 

u;^^ = Trsr]^^^=Ar(*^^'). 

It is common to write 

lM)Bh = -H{A\B)^ 

acknowledging the directionality of coherent information from A to B. While we 
will freely interchange the two notations for coherent information throughout this 
dissertation, we will generally write Ic{A)B) when characterizing capacity regions 
and proving the converses for the main theorems, while the notation I^lp,/^) will be 
utilized more frequently in the coding theorems. 

Let us review a few facts concerning coherent information. 

Property. For a maximally entangled state 
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we have 

h{A)B)^ = \ogk. 

Proof. 

H{B)^ - H{AB)^ = H{Tik) - = log A; - 0. 

Property. For any state on AB, 

UA)B)<mm{H{A),H{B)}. 

Proof. We begin by observing that 

Ic{A)B) = H{B) - H{AB) < H{B) 
< H{B). 

To see that Ic{A)B) < H{A), we start with Lieb's inequahty 

\H{B)-H{A)\ < H{AB). 

Getting rid of the absolute value and subtracting H{B) from each side yields 

-H(A) < H{A\B). 

Multiplying the sides by —1 completes the argument. 
Property. For any channel M: A' ^ B and any p^' , 

/c(p, AT) < log 
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Proof. Fix a purification 1$)^"^' of p^'. Then 



< H{A)^ 
= H{A% 

< loel^'l. 



Property. For fixed p^' , I^^p^M) is a convex function ofJ\f. 

Proof. Fixing a purification of p"^', observe tliat tlie state u"^^ = A/'(\l/) is linear 

function of A/". ButH{A\B) is concave in a;^^, and thus in A/", so /c(p, A/") = -H{A\B) 
is convex in A''. ■ 

Remark. This property is in close agreement to the corresponding statement that 
I{X;Y) is convex in p{y\x). However, I{p.,M) is not generally concave or convex in 
p, for a given fixed M . 

3.5 Conditional coherent information 

In the appendix fSection lll.l|) . we show that if A/*: A' — > BX is an instrument with 
components {p{x)N'x}i then any isometric extension U: A' ^ BEX of J\f can be 
expressed as 

U = ^^/^\x)^\x)^' ®U^, 

X 

where the Ux'- A' BE' are isometric extensions of the A/'x, and E = E'X' . We also 
verify there that Tr^;^ = Af, while TibxU = J\f^, where 

J\f^ = ^p{x)\x){xf ®M^. 

X 

Above, each component p{x)f/^ is formed from a complement A/"^: A' — > E' of the 
corresponding normalized component A/'x of Af. The main observation here is that the 
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environment E = E'X' of the instrument includes the common environment E' to the 
component channels A/"^ as well as a part X' which purifies the classical component 
X of A^. 

For any p"^', the coherent information over Af can thus be expressed as 
hip^Af) = H{M{p))-H{M\p)) 

X X 

= H{X) + (A4(p)) - HiX) - J2pi^)H{Kip)) 

X X 

= Y^p{x)h{p,M^). 

X 

In the third line, we mirror the calculation of the entropy of a cq state performed in 
Sectioning The coherent information over A/" is thus just the average of the coherent 
information over each A4. Another way to see this is to note that 

h{p,M) = H{M{p))-H{M\p)) 

= H{BX) - H{E'X') 
= H{B\X) - H{E'\X). 

A third derivation fixes a purification l^')^'^' of p^' and defines the state 

\^^ABEX _ \^^ABE'X'X 

noting that 

H{BX) - H{E) = H{BX) - H{ABX) 

= -H{A\BX) 

= IM)BX). 



Chapter 4 

Capacity theorems for single-user 
channels 

In this chapter we recall various existing capacity theorems from the literature. After 
reviewing the proof of the capacity theorem for a classical channel, we will see that 
the main ingredients of that proof have counterparts for quantum channels, both for 
the transmission of classical and of quantum information. The common element to 
all of the situations is as follows. Each assumes that the sender and receiver are able 
to transmit an unlimited number of times over a collection of identical channels. It 
is useful to think of these channels as acting in parallel, as sequential transmissions 
can be thought of as parallel transmissions "in time". After giving an operational 
definition of a set of rates at which the sender can communicate to the receiver 
arbitrarily well, the capacity is then defined to be the supremum, or least upper bound, 
of those achievable rates, representing the ultimate rate at which arbitrarily reliable 
communication can occur, provided that the channel can be used any number of times. 
The capacity is then described, or characterized, in terms of some optimization of 
entropic quantities over a well-defined collection of classical probabilities or quantum 
states. Later, when we characterize various capacity regions for quantum multiple 
access channels, we will invoke the single-user coding theorems for quantum channels 
introduced in this chapter. 
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4.1 Classical capacities of classical channels 

Suppose that two parties, Alice and Bob, are connected by a large number of identical 
classical channels with probability transition matrix p{y\x). This is to be interpreted 
as follows. At any given time, Alice can choose to send a symbol x G A" to Bob. 
Because of noise. Bob "hears" a corrupted version of the symbol x. Specifically, 
he receives the symbol y & y with the conditional probability p{y\x). Fixing a 
probability distribution p{x) on Alice's input symbols defines a random variable X. 
Together with the conditional probabilities piylx), this yields a joint distribution 
p{x, y) of a pair of correlated random variables X and Y. The classical capacity of 
the channel p{y\x) is the logarithm of the number of distinguishable inputs, whereby 
Alice uses the channel many times to send Bob a message which he can ascertain 
arbitrarily well. Shannon jUj gave the following formula for the capacity: 

C = max I{X;Y). (4.1) 

p{x) 

Mathematically, he proved that this expression equals a certain operationally defined 
capacity which we now review. Suppose Alice tries use the channel n times to send 
information to Bob at a rate of R bits per channel use. To this end, she selects 
a collection of codewords, consisting of 2"^ sequences of input symbols x"(m), one 
sequence for each message she would like to send, and reveals them to Bob. This can 
be modeled by an encoding function 

Since the channel is noisy. Bob will receive a noisy version of Alice's message, denoted 
Y"'{m). Let the decoding function 

g,yn ^ 2nR 

describe some scheme by which Bob attempts to decide which message Alice had 
intended for him to receive. Using this scheme, Alice and Bob have effectively created 
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a new channel 

Q{m\m) = P(y"|/M)> 

whereby each message m G 2"^ Ahce may choose to send induces a distribution on 
the possible messages Bob may decode. We might allow Alice to use a stochastic 
encoder p(x"|m), in which case the effective channel would be 

Q{m\m) = I x" |m) . 

If Alice sends the message m G 2"^, the probability Bob decodes the message incor- 
rectly can be expressed in a number of ways: 

Pe{m) = Pi{M ^ m\M = m} 
= Pr{^(y"(m)) ^ m} 
= 1 — Q{m\m) 
= Q{m\m). 

Associated to the coded channel Q{m\m) is its maximal probability of error 

-Pmax = max Pe{m) 
and its average probability of error 

Pave = 2-"^ J2 Pe{m). 

One may phrase the goal of successful communication as that of simulating a 
fictitious identity channel id: 2"^ — s> 2"-^ from Alice to Bob, where id(m|m) = 5m,m- 
Perfect simulation would amount to using a zero-error code. Approximate simulation 
can be gauged in a number of ways. For example, one could require that either Pave 
or Pmax is small. Clearly, the former will imply the latter. 
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Suppose that Alice chooses her message M randomly according to the distribution 
P{m). If she sends her message through the identity channel to Bob, the two will 
hold a perfectly correlated pair of random variables (M, M), distributed as 

dist (M,M)p(m,m) = P{m)5m,fh- 

However, Alice will actually be sending through the coded channel Q{m\m), gener- 
ating a pair of noisy correlated random variables (M, M) distributed as 

dist(M, M)p{m^rn) = P{m)Q{7n\'m). 

One way to judge the success of the simulation is to consider the the ii norm A(P) 
between the two distributions dist(M, M)p and dist(M, M)p. This is calculated as 

A(P) = |dist(M, M)p - dist(M, M)p ^ 

= ^ \P{m)5rn,fh - Pirn)Q{m\m)\ 



m,m=l 

P{m)\5m,fh- Q{m\m\ 

m,m=l 

P{m) ( (1 — Q{fh\ni)) + Q{rn\m, 

m=l 

2^P(m)Pe 



m=l m=l 

m=l 
2EpPe(M). 



In other words, the ii distance between the ideal and the actual joint distributions is 
precisely equal to twice the expected error probability. Observe that 



A(unif(2'^^)) = 2Pave and A{Sm) = 2Pe(m). 



Further note that requiring that the maximal error probability be less than e is 
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equivalent to demanding that A{6m) < 2e for each m, where 6m is a point distribution 
at {M = m}. It is worth noting that the latter requirement is also equivalent to 
requiring that A(P) < 2e for all distributions P{m). 

So, communication can be viewed in the light of generating near perfect common 
randomness over noisy quantum channels. We have phrased things in this way as 
it makes the road to quantum communication a bit easier. Rather than asking the 
sender and receiver to end up with classical correlations, we will see later in Section l^^ 
that they attempt to build quantum correlations. 

Any code (/, g) which encodes 2"^ messages using n instances of a channel p{y\x) 
such that Pe{fn) < e for all m G 2"^ will be called an {R,n,e) maximal error code 
for the channel p{y\x). A rate R is said to be achievable if there exists a sequence 
of {R, n, e„) maximal error codes with 0. The (operational) capacity of the 

channel p{y\x) is then defined to be the supremum of the set of achievable rates. 
Shannon's capacity theorem states that this operationally defined capacity is equal 
to the number C, defined in (14. ip . 

The channel capacity theorem is proved in two main parts. First, it is proven that 
for any rate R < C, R is achievable. This is provided by a coding theorem, which 
is generally structured as follows. Given e > and some rate i? < C, it is shown 
that there is a long enough blocklength n so that there exists an (i?, n, e) code. As 
e was arbitrary, this immediately implies the existence of a sequence of such codes 
which achieves the rate R, corresponding to any sequence of error probabilities which 
go to zero. The second component is called the converse. In this part, it is shown 
that every achievable rate R satisfies R < C. These components are summarized in 
Figure 14.11 

One route to proving the coding theorem involves first showing that codes with a 
weaker error constraint exist. Rather than requiring that every message have a low 
error probability, it is sufficient to show that the error probability, averaged over all 
codewords m G 2"-^ is small. A code satisfying this weaker constraint will be called an 
average error code. A way to prove such a coding theorem is through the technique 
of random coding. For an arbitrary distribution p{x), define the product distribution 
p{x"') = YYi=iPi^i)- ^ ^^^^ ^ random encoder is then defined by randomly selecting 
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coding theorem 


converse theorem 


R < C =^ R achievable 


R achievable =^ R < C 



capacity theorem 
R < C <^=^ R achievable 

Figure 4.1: Components of a capacity theorem 

2nR codewords 

C = {X"(1),...,X"(2"^)} 

i.i.d. according to The following coding proposition, or some variant thereof, 

is proved in many textbooks on information theory, such as in [HH ITT] . 

Proposition (Classical channel coding theorem). Given is a channel p{y\x) , 
an input distribution p{x), and a number < R < I{X;Y), where I{X;Y) is com- 
puted with respect to p{x, y) = p{x)p{y\x) . For every e > 0, there is n sufficiently large 
so that if 2"-^ codewords C = {X"(l), . . . ,X"(2"^)} are chosen i.i.d. according to the 
product distribution p{x^) = YliPi^i)' there exists a decoding function g: 2"-^ 
which depends on the random choice of codebook C and correctly identifies the input 
message with expected average probability of error less than e, in the sense that 

Ec2~"^ J2 Pr{^(F"(m)) = m} > 1 - e. 

Observe that, because of the symmetry in the code construction, the expectation 
of each term in the above summation is the same. It is thus possible to reexpress that 
error condition as 

EcPr{^7(y-(l)) = l}>l-e, 

showing that at the level of random codes, one may assume that the message m = 1 
has been sent without losing any generality. 

It is a simple task to "derandomize" any code which is guaranteed to exist by 
Proposition 0. Suppose that Alice chooses a message uniformly distributed on the set 
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{1, . . . , 2"'^}, represented by the random variable M, to send to Bob. Then 

EcPr{^(F"(M) = M)} = 2""^ ^ Ec Pr{^(F"(m) = m)} 

> 1-e. 

It is then immediate that there must exist a particular deterministic code yielding an 
average probability of success at least as large as 1 — e. 

So far, this is enough to conclude that every input distribution p{x) yields a lower 
bound to the average error capacity of p{y\x). This is because each p{x) corresponds 
to a set of achievable rates {R : < R < I{X] Y)}, and the largest such set is given 
by optimizing over all p{x). 

Recall that we have defined the operational capacity C in terms of the maximal 
probability of error constraint. However, we have only outlined how to show that 
codes with low average error exist. By Markov's inequality from probability theory, 
if the average error probability is less than e, then at least half of the codewords 
have an error probability less than y/e. By only using these codewords, a rate R — \ 
code with maximal error probability ^Je is obtained, and thus every rate less than 
R is achievable with maximal error, showing that the maximal and average error 
capacities are the same. 

While the coding proposition implies the existence of sequences of codes achieving 
any rate less than capacity, it remains to prove that no such sequences exist for rates 
above capacity. Rather than reproduce the entire converse theorem, we outline the 
basic structure of the theorem. First, one assumes that R is an achievable rate. 
This means that there should exist a sequence of (2"^, ra, e„) codes with e„ 0. 
For any ra, let p{x"',y^) = HiPl^il^i) be the joint distribution on X" and Y"^ 

induced by selecting codewords uniformly at random from the corresponding code in 
the sequence. An initial step in the proof shows that 

R < -/(X"; r") + e'^^ 
n 

where e'„ — > as e„ — > 0, and /(X"'; F") is evaluated with respect to the induced 
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distribution y"). For any joint distribution on X"" and F", the following can be 
easily proved: 

1 1 

i=l 

If z* achieves the maximum on the right hand side, the marginal distribution p{xi*) 
provides a "witness" to the fact that the rate R is in fact achievable {R is thus less 
than the maximum mutual information over all input distributions). This proves 
that the capacity formula is additive, and thus that every achievable rate is upper 
bounded by the solution of a "single-letter" optimization problem. For this reason, 
this second conceptual step in the converse is known as single-letterization. Without 
it, one would only be able to write the capacity as 

C = lim ^max/(X^F^) 

fc^oo k p{xk) 

a result which follows by applying Proposition to extensions of the channel 

k 

p{yV) = YlpivM- 

1=1 

Such an expression has become known gularized" expression for the capacity. 

Actually, this is a persistent problem in quantum information theory. The best known 
expressions characterizing the capacities of an arbitrary quantum channel to trans- 
mit classical or quantum information are regularized maximizations of information 
quantities over appropriate sets of input states. 

4.2 Classical capacities of quantum channels 

Suppose that Alice and Bob are connected via some large number n of instances 
of a quantum channel A/", and that Alice wishes to transmit classical messages to 
Bob. The overall maximal rate at which is this is possible is the classical capacity 
C{N') of the channel A/", which is the logarithm of the number of physical input 
preparations Alice can make, per channel use, so that Bob can distinguish them 
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arbitrarily well by measuring the induced states at the outputs of the channels. The 
best known expression for the classical capacity of a quantum channel, due to Holevo 
Schumacher and Westmoreland [12]; is the following regularized formula, known 
as the HSW Theorem: 

C(Af) = lim ^maxI(X;B'')^. 

Here, the maximization is over all pure state input ensembles {p{x), \(px)^"'} of states 
for Alice to prepare at the inputs to k parallel instances of the channel A/". For a 
given ensemble, the mutual information is computed relative to the corresponding cq 

state 

m 

Operationally, the classical capacity of A/" is defined in analogy to that of a classical 
channel. A (2"-^,n) code consists of 2"'^ message states . . . , |02"-r)^"'} for 

Alice and a corresponding measurement for Bob, mathematically modeled as POVM 
with 2"^ outcomes {Am}m£2"R- We call this code an (2"^,?7,, e) code if the following 
constraint on success probability, averaged over all messages, is satisfied: 

2-nR J2 TrA„Ar«'^(</.„,) > 1-e. 

A rate R is achievable if there exists a sequence of (2"-^, n, e^) codes with — > 0, and 
the capacity C{Af) is the supremum of all achievable rates. 

As with the capacity of a classical channel, the proof that C{Af) can be expressed 
in such a regularized form has two parts, a coding theorem and a converse. The 
following coding theorem is attributed to Holevo [53] , Schumacher and Westmoreland 

Proposition 1 (HSW Theorem). Given is a cq state a^^ = YlxPi^)\^)i''^\^ ® Px 
and a number < R < I{X; -B)o-. For every e > 0, there is n sufficiently large so that 
if 2^^ codewords C = {X"(m)} are chosen i.i.d. according to the product distribution 
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p{x"') = YYi=iPi^i) ! corresponding to input preparations 

there exists a decoding POVM {Am} on B"- which depends on the random choice of 
codebook C and correctly identifies the index m with average probability of error less 
than e, in the sense that 

2nR 

Ec2-"^^Trpx"MAm > (4.2) 

m=l 

Due to the symmetry of the distribution of C under codeword permutations, it is 
clear that the expectations of each term in the above sum are equaL In other words, 

Ec2-"^^Trpxn(m)Am = EcTrpx"(i)Ai, (4.3) 

m=l 

The arguments for derandomization and for obtaining a good maximal error code are 
identical to those used for classical channels in the previous section. 

A proof of the converse begins, as before, by assuming that R is an achievable 
rate. Taking a cq state uj-^^" induced by an (i?, n, e„) code in the achieving sequence, 
Fano's inequality (Lemma EI) and the Holevo Bound (Lemma [7j) are used ^ to show 
that 

i?< i/(X;5"),, 

n 

where again — ^ as e„ — 0. However, it is an important open problem as to 
whether a single-letterization step can be proved. No counterexample to additivity is 
known, and it is widely believed that none exists. 

^These details are given more explicitly in the converse proofs of the main theorems fSection l7.2|l . 
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4.3 Quantum capacities of quantum channels 

The quantum capacity Q{J^) of a quantum channel J\f: A' ^ B is the answer to a 
number of physical questions regarding the possibilities of performing various oper- 
ational information processing tasks over many parallel instances of the channel N". 
Qi^M) is the logarithm of various quantities: 

• the amount of entanglement that can be created (entanglement generation) 

• the amount of entanglement that can be sent (entanglement transmission) 

• the size of a Hilbert space all of whose states can be reliably transmitted 
(subspace transmission) 

• the size of a Hilbert space all of whose entangled states can be reliably trans- 
mitted (strong subspace transmission). 

All of these quantities have units of quhits per channel use, and as the rates at which 
these tasks are possible all coincide, it is justifiable to say that they all represent 
"sending quantum information," and hence to speak of a single quantum capacity 
Q{N'). The best known characterization of the quantum capacity is a regularized 
maximization of the coherent information 

Q{U) = lim imaxJ,(A)5^)^, 

fc^oo K XA' 

where for each /c, the maximization is over all states of the form 

Such a state uj will be said to arise from J\f®^ or rather, to arise from the action 
of jsf^^ on the bipartite pure state |$)^'^"'. Here, the regularization is known to be 
necessary for a general quantum channel, as opposed to the case with the classical 
capacity C(A/'), where the existence of a single-letterization step in the converse is an 
open problem. The existence of a counterexample to additivity is known |46j . 
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Of the different operational definitions of Q{J\f), the simplest to describe is en- 
tanglement generation, since it can defined without explicit mention of encodings. 
Suppose that a large number n of channels Af: A' ^ B are available from Alice to 
Bob. Alice and Bob will use the channels to build a large maximally entangled state 
between degrees of freedom of some physical systems located in their respective lab- 
oratories. To this end, Alice prepares some bipartite pure state IT)"^"^'", entangled 
between some system A of dimension 1^41 = 2^^ in her laboratory, and the inputs 
A'" of the channels. After the actions of the channels, Alice's system A is correlated 
with the outputs 5" of the channels quantum mechanically. Bob then performs some 
post-processing procedure, modeled by a quantum operation V : B" — > A, to transfer 
the quantum correlations from the outputs 5" of the channels to an "output" physi- 
cal system A, also of dimension \A\ = 2"*5 in his laboratory. Their goal is to produce 
a state which is close to some target maximally entangled state |$)"^"^. More specifi- 
cally, we say that they generate entanglement at rate Q if they produce a maximally 
entangled state of the form 

We will call such a state a rate Q maximally entangled state. The blocklength n will 
always be apparent from the context. 

(\T)^^'" ,T)) will be called a (Q, n, e) entanglement generation code for the channel 
J\f if, for the rate Q maximally entangled state 1$)"^^, we have 

V o Ar«"(T^^'")) > 1 - e. 

A rate Q is an achievable rate for entanglement generation over the channel TV if 
there exists a sequence of {Q, n, e„) entanglement generation codes with e„ 0. The 
entanglement generating capacity Q'^^{N') of A/" is then defined operationally as the 
supremum of all such achievable rates. 

We will now introduce a number of coding propositions from ^3], each a more 
refined version of the previous one. While the first is sufficient to prove achievability 
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for single-user channels, the others have additional properties which we will need later 
when we characterize various capacity regions of quantum multiple access channels. 

Proposition (Entanglement generation coding theorem). Given is a channel 
Af: A' ^ B, a density matrix p^' , and a number < Q < Ic{p^N'). For every e > 0, 
there is n sufficiently large so that there is a [Q, n, e) entanglement generation code 
(|T)^^'",P) /or AT. 

Recall the discussion in Section EiU regarding the two different ways of expressing 
coherent information. Given an input density operator if |\&)^'^' is any purifica- 
tion of p, then the identity 

Up,Af) = h{A)B)^^^) 

holds. This proposition then guarantees that for every state = A/'(\&) arising from 
the action of A/" on a state |\E')'^'^', every rate < Q < I(,{A)B)^ is an achievable 
rate. This works by applying the coding theorem to the input state p^' = Tr^ \E'. 

As with the classical capacity, it is also true that for each integer > 0, if u' 
arises from A/'®''', then every rate < Q < ^Ic{A)B'')^' is achievable as well. We 
then conclude that 

Q(AA) > lim lmaxI^{A)B''). 

The usual Shannon-theoretic prescription for converse theorems applies here as 
well, although as mentioned above, it known that a single-letterization step cannot 
be proved for arbitrary TV. Suppose that Q is achievable, and fix a {Q, n, e„) entangle- 
ment generation code (\T)^^'"' ,T)) in the achieving sequence of codes. The encoding 
|T) gives rise to the state u^^" = A/'®"(T). It is a simple consequence of the quantum 
data processing inequality (Lemma El) and continuity of coherent information in the 
input density operator (Lemma EI) that ^ 

n 

^these details are given more explicitly in the converse proofs of the main theorems fSection l7.2|l . 
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where — 0. By standard arguments we then conclude that 

g(A/') < hm ymaxIc{A)B''). 

The state Tr^ T which is induced by Ahce's encoding at the inputs A'^ of A/"*^" is 
called the code density operator of the entanglement generation code. With random- 
ization, it is possible to make this operator arbitrarily close to the product state p®", 
where p"^' is the input density matrix used when invoking the proposition. If Alice 
and Bob have access to a shared source of randomness, they may utilize an ensemble 
of codes to this end. This is very useful for our multiple access coding theorems, as it 
guarantees that if one sender codes randomly, the induced channel seen by the other 
sender is close to a product channel, allowing coding theorems for product channels 
to be invoked. 

A (Q, n, e) random entanglement generation code consists of a collection of deter- 
ministic {Q,n,e) entanglement transmission codes (|T^)'^'^'", P^) and a probability 
distribution Pp, corresponding to a source of shared common randomness available 
to both sender and receiver. We will often omit the subscript, once the randomness 
of the code has been clarified, and it will be understood that |T) and V constitute 
a pair of classically correlated random objects. Associated to a random code is its 
expected, or average code density operator 

/'"=E^Tr^T = J]P^Tr^T^ 

which is the expectation, over the shared randomness, of the state at the channel 
inputs A'". The following extension of the previous coding proposition pertains to 
these random codes and is also proved in |13j . 

Proposition (Random entanglement generation coding theorem). Given is a 
channel Af: A' ^ B, a density matrix p^' , and a number < Q < Ic{p,N'). For every 
e > 0, there is n sufficiently large so that there is a {Q, n, e) random entanglement 
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generation code {P^^ y£P^AA'^ ^x>^) for N with average code density operator 
satisfying 

Finally, there are certain features of the decoder structure of random entanglement 
generation codes that are necessary for proofs which utilize quantum side information 
at the decoder. This final form of the coding proposition is the most powerful, utilizing 
features which are implicit from the proof of the coding theorem of pS]- This will be 
the proposition which is invoked later in the dissertation. 

Proposition 2. Given is a channel M: A' B, a density matrix p^' , and a number 
< Q < IJ^p^M\ For every e > 0, there is n sufficiently large so that there is a 
random (Q,n^e) entanglement generation code {Pp, \T^)^^'" ,V^) for Af with average 
code density operator 

=E^Tr^T = 5^P^Tr^T'^ 



satisfying 



k-p''"|i<e. 



Furthermore, given any particular isometric extension Uj\f : A' BE of M , it is 
possible to choose isometric extensions : B^ AF of the deterministic decoders 
so that 



for every i and the same fixed pure state |A) 



Chapter 5 
Main results 



5.1 Quantum multiple access channels 

For this dissertation, a quantum multiple access channel will have two senders and 
a single receiver. While many-sender generalizations of the theorems which appear 
here are readily obtainable, we focus on the case with two senders for simplicity. Such 
a channel M : AB' — > C will generally be one in which Alice and Bob simultaneously 
transmit to Charlie. We will assume throughout that no other resources are available 
to the three parties. Namely, none of the parties share any prior classical or quantum 
correlations between themselves, nor do they have access to any other auxiliary chan- 
nels. If Alice inputs a physical system with density matrix p^', while Bob's input has 
density matrix , Charlie will receive the state M{pi ® ^2)- 

In the next section, we give an operational definition of the four-dimensional 
region S (A/") , which consists of the rates at which each sender can simultaneously send 
classical and quantum information to Charlie. Sections 15.31 and 15.41 state the main 
results of this dissertation. These results characterize the two-dimensional shadows 
of S{N') corresponding to the situation where Alice sends classically while Bob sends 
quantum information (Theorem 1), and that where each sends quantum information 
(Theorem 2). 

These theorems will be proved by first showing in Chapter [7| that the characteri- 
zations given in Sections l5 . 31 and describe other sets of operationally defined rates. 
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corresponding to weaker constraints on good codes than those to be introduced in this 
chapter. In Chapter |HJ it will ultimately be shown that the other sets of operationally 
defined rates equal those introduced in this chapter. 

5.2 S{Af) - the general problem 

Assume that Alice and Bob are connected to Charlie by n instances of a multiple 
access channel Af: A'B' C, where Alice and Bob respectively have control over the 
A'^ and B'^ inputs. We will describe a scenario in which Alice wishes to transmit 
classical information at a rate of Ra bits per channel use, while simultaneously trans- 
mitting quantum information at a rate of Qa qubits per channel use. At the same 
time, Bob will be transmitting classical and quantum information at rates of Rb and 
Qh respectively. Alice attempts to convey any one of 2"^" messages to Charlie, while 
Bob tries to send him one of 2"-^* such messages. We will also assume that the senders 
are presented with systems A and B, where \A\ = 2"*^° and \B\ = 2"'^\ Each will 
be required to complete the following two-fold task. Firstly, they must individually 
transfer the quantum information embodied in A and B to their respective inputs A'"' 
and 5'" of the channels, in such a way that it is recoverable by Charlie at the receiver. 
Second, they must simultaneously make Charlie aware of their independent messages 
Ma and Mb. Alice and Bob will encode with maps from the cq systems holding their 
classical and quantum messages to their respective inputs of A/"®", which we denote 

Si : Ma A A'" and £2 : MbB 5'". 

Charlie decodes with a quantum instrument 

X>: ^ MaMbAB. 

The output systems are assumed to be of the same sizes and dimensions as their 
respective input systems. For the quantum systems, we assume that there are pre- 
agreed upon unitary correspondences ida- A — >• A and idb'. B — s> B between the 
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degrees of freedom in the quantum systems presented to Alice and Bob which em- 
body the quantum information they are presented with and the target systems in 
Charhe's laboratory to which that information should be transferred. The goal for 
quantum communication will be to, in the strongest sense, simulate the actions of 
these corresponding identity channels. We similarly demand low error probability 
for each pair of classical messages. Formally, {Si,S2,'D) will be said to comprise an 
{Ra, Rbi Qai Qbi e) strong subspace transmission code for the channel A/" if for all 
ma G 2"^", nib G 2"'^'', |\E'i)'^"^, \'^2)^^, where A and B are purifying systems of 
arbitrary dimensions, 

where 

We will say that a rate vector {Ra, Rb, Qa, Qb) is achievable if there exists a sequence 
of {Ra,Rb,Qa,Qb,n,en) stroug subspace transmission codes with e„ 0. The si- 
multaneous capacity region S{Af) is then defined as the closure of the collection of 
achievable rates. Setting various rate pairs equal to zero uncovers six two-dimensional 
rate regions. The next section contains our first theorem, which gives a multi-letter 
characterization of the two shadows relevant to the situation where one user only 
sends classical information, while the other only sends quantum information. The 
following section contains a theorem which describes the rates at which each sender 
can send quantum information via a multi-letter formula. 

5.3 CQ(Af) - classical-quantum capacity region 

Suppose that Alice only wishes to send classical information at a rate of R bits per 
channel use, while Bob will only send quantum mechanically at Q qubits per use of the 
channel. The rate pairs {R, Q) at which this is possible comprise a classical-quantum 
(cq) region CQ{M) consisting of rate vectors in S{Af) of the form (_R, 0,0,Q). Our 
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first theorem gives a characterization of CQ{J\f) gularized union of rectangles. 

Theorem 1. CQ{Af) = the closure of the union of pairs of nonnegative rates {R,Q) 
satisfying 

R < jI{X-C% 

k 

k 

for some k, some pure state ensemble \4>x)^"'} (ind some bipartite pure state 

l^)^^"' giving rise to the state 

^XBC^ = ^p{x)\x){xf ^^^^(Px ® ^)). (5.1) 

X 

Further, it is sufficient to consider ensembles for which 

\X\ <max{|A'|,|C|}2^ 

It should also be noted that this characterization does not apparently lead to a 
finite computation for determining the capacity regions, as it does not admit a single- 
letter characterization in general. However, as an application, the following example 
contains a channel for which this region is additive. 

Example. Consider an erasure channel into which Alice inputs a classical bit (or 
rather, a qubit that will be dephased into the |0)^', basis), while Bob inputs a 
qubit. If Alice inputs |0)"^', Charlie receives Bob's qubit without error. If Alice inputs 
Charlie receives a pure erasure state le)*" which is orthogonal to the degrees of 
freedom of Bob's input state. The cq capacity region of this channel is equal to the 
collection of pairs of nonnegative cq rates (i?, Q) which satisfy 

R < H{p) 
Q < l-2p 

for some < p < |. This region is pictured in Figure l5!Tl 
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Figure 5.1: CQ (erasure channel) 

Proof. In Section 19.11 we prove this for the more general case where Bob inputs a 
(i-level quantum system. ■ 

Remark. It is also possible to characterize CQ{Af) gularized union of pen- 

tagons, a form which is analogous to the result of |^ for classical multiple access 
channels. As we do not yet know an example of a channel for which this character- 
ization is single- letter (and not equivalent to the rectangle region above), we defer 
further consideration of this characterization until Chapter ^| 

Remark. The proof of the bound on \X\ is found in the appendix (Section II 1.3p . 

5.4 Q{Af) - quantum- quantum capacity region 

The situation in which each sender only attempts to convey quantum infomation 
to Charlie is described by the quantum-quantum (qq) rate region Q{N') which con- 
sists of rate vectors in Si^AT) of the form (0, 0, Qa, Qb)- Our second theorem gives a 
characterization of Q{M) as a regularized union of pentagons. 

Theorem 2. Q{J\f) = the closure of the union of pairs of nonnegative rates {Qa,Qb) 
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satisfying 



Qa + Qb 



Q. 



a 



k 

< \h{B)AC'') 
k 

< yUAB)C') 



for some k and some bipartite pure states I'^i)^^"' , |\l/2)^^"° giving rise to 



ABC'' 



(*1 ® ^2). 



(5.2) 



Example. An example of a channel for which this region is single-letter is a chan- 
nel into which Alice and Bob each input a qubit. With probability p, each of their 
qubits undergoes a phase flip, or 180° rotation about the z-axis, before being received 
by Charlie. Otherwise, Charlie receives both qubits without error. The qq capac- 
ity region of this channel is given by a single pentagon, consisting of the pairs of 
nonnegative qq rates {Qa, Qb) which satisfy 



Remark. There does not appear to be any obstacle preventing application of the 
methods used in this paper to prove many-sender generalizations of Theorems 1 and 
2. For simplicity, we have focused on the situations with two senders. 

Remark. Contrary to the corresponding result for classical multiple access channels, 
the regions of Theorems 1 and 2 do not require convexification. That this follows 
from the multi-letter nature of the regions will be demonstrated in the appendix 
(Sectioning). 



Qa < 1 



Qb < 1 

Qa + Qb < 2- Hip). 



Proof. See Section IHiH 



Chapter 6 

Supplementary results 



In this chapter, we collect a number of auxiliary results which will be used to prove 
the main theorems. The first section contains some relationships satisfied by the 
distance measures of trace distance and fidelity which will comprise the machinery 
used to prove the coding theorems. The main novel contribution of that section 
is the statement and proof of Lemma El The next section contains other lemmas, 
proved elsewhere, which will needed later. In the third section we review strong 
subadditivity of quantum entropy, and explore a number of its consequences. These 
include quantum versions of the classical data processing inequality, as well as the fact 
that conditioning decreases conditional quantum entropy or equivalently, increases 
coherent information. We also obtain a particularly elegant proof of the Holevo 
bound on the accessible information of an ensemble of quantum states. 



6.1 Further properties of distance measures 

We first collect some relevant results which will be used in what follows, starting with 
some relationships between our distance measures. If p and a are density matrices 
defined on the same (or isomorphic) Hilbert spaces, set 

F = F{p, a) and T = |p — cr|i. 
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Then, the following inequalities hold (see e.g. 

1-Vf < T/2 < Vl - F, (6.1) 
l-T < F < 1-TV4. (6.2) 

Prom these inequalities, we can derive the following more useful relationships 

F>l-e =^ T<2v^ (6.3) 
T < e ^ F > 1 - e, (6.4) 

which are valid for < e < 1. Uhlmann j^Tj has given the following characterization 
of fidelity 

F{p,a) = max K^'pl*.)!' = max |(^',|$,) 1^ 

I*P>.|*<T> I'I'p} 

where the first maximization is over all purifications of each state, and the second 
maximization holds for any fixed purification of a. This characterization is useful 
in two different ways. First, for any two states, it guarantees the existence of purifica- 
tions of those states whose squared inner product equals the fidelity. Second, one can 
derive from that characterization the following monotonicity property associated 
with an arbitrary trace-preserving channel Af, 

F(p,a) < F{Afip),Afia)) (6.5) 

An analogous property is shared by the trace distance |4Uj . 

\p-a\i > \Af{p)-Af{a)\,, (6.6) 

which holds even if Af is trace-reducing. A simple proof for the trace-preserving 
case can be found in [33]. These inequalities reflect the fact that completely-positive 
maps are contractive and cannot improve the distinguishability of quantum states; 
the closer states are to each other, the harder it is to tell them apart. Another useful 
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property will be the multiplicativity of the fidelity under tensor products 

Fipi ® P2, cTi ® (72) = F{pi, ai)F{p2, 0-2). (6.7) 

Since the trace distance comes from a norm, it satisfies the triangle inequality. The 
fidelity does not come from a norm, but it is possible to derive the following analog 
by applying (j6.1|) and (|6.2p to the triangle inequality for the trace distance 

F{pi,P3) > 1 - 2 - F(pi, P2) - 2 Vl - F{p2, ps). (6.8) 

It will be possible to obtain a sharper triangle-like inequality as a consequence of the 
following lemma, which states that if a measurement succeeds with high probability 
on a state, it will also do so on a state which is close to that state in trace distance. 

Lemma 1. Suppose that p,cr,A G C'^^'^, where p and a are density matrices, and 
< A < 1. Then, TrAa > Tr Ap- |p- 

Proof. 

TrA(T = 

> 

where the last equality invokes a characterization of the trace distance between density 
matrices given in Section Ti. 1.41 ■ 

Since F{(j), p) = Tr 0p when </> is a pure state, a corollary of Lemma ^ is a fact we 
will refer to as the "special triangle inequality." 

Corollary (Special triangle inequality). 

F(0,a)>F(0,p)-|p-(T|i, 



TrAp-TrA(p-(T) 

Tr Ap — max 2 Tr A(p — a) 

0<A<1 

Tr Ap - |p - all. 



The following lemma can be thought of either as a type of transitivity property 
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inherent to any bipartite state with a component near a pure state, or as a partial 
converse to the monotonicity of fidehty. 

Lemma 2. For arbitrary quantum systems A and B, let he a pure state, a 
density matrix, and fl^^ a density matrix of the composite system AB with partial 
traces Vt"^ = Tr^ and Vt^ = Tia fi- Then 



Proof. We begin by defining the subnormahzed density matrix oo via the equation 



which we interpret as the upper-left block of fi, when the basis for C'^' is chosen 
in such a way that \(f)) = (1, 0, . . . , O)"'". Notice that F{(j),TTB^) = Tiu) = (1 — e). 
Writing the normalized state u = uj/{l — e), we see that it is close to uj in the sense 
that 



F{(p<^p,n) >l-\p-n 



3{1-F{<P,Q^)). 



(0® 1) = 0® tU, 



(6.9) 



UJ — Uj\i < €\uj\i 



< e. 



(6.10) 



Now we write 



^/F{f^^ 



Try^V(0®P)^^\/(0®P) 




y/{l-e)F{u;,p) 
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The first line is the definition of fidehty and the third follows from ()6.9p . The last 
equality relies on the fact that the fidelity, as we've defined it, is linear in either of 
its two inputs, while the inequality follows from (|6.2j) . 

Noting that > u), we define another positive operator u' = — uj, which 
satisfies Ti uj' < e and can be interpreted as the sum of the rest of the diagonal blocks 
of Q. The trace distance in the last line above can be bounded via double application 
of the triangle inequality as 



|i 



\p-^\i < \p - {p - uj')\i + \{p - uj') - Uj\i + \uj - u\ 
< Tiu' +\p-Q^\_^ + e 

\p-n^\^ + 2e, (6.12) 



< 



where the second line follows from ()6.10|) . Combining ()6.11|) with ()6.12|) . we obtain 

F{(f)®p,n) > (1 -e)(l - |p-fi^|i -2e) 
> l-\p-n^\^-3e. 



6.2 Other useful lemmas 

This continuity lemma from |3] shows that if two bipartite states are close to each 
other, the difference between their associated coherent informations is small. 

Lemma 3 (Continuity of coherent information). Let p^^ and cr^^ be two states 
of a finite- dimensional bipartite system AB satisfying \p — a\i < e. Then 

\IM )B), - IM )B)^\ < 2H{e) + 4 log \A\e, 

where H{e) is the binary entropy function. 

Next is Winter's "gentle measurement" lemma |1HI; which implies that a measure- 
ment which is likely to be successful in identifying a state tends not to significantly 
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disturb that state. 

Lemma 4 (Gentle measurement). Let a density matrix be given, where \A\ is 
finite. //A G Cl^l^l"^' is nonnegative with spectrum hounded above by 1, then 

Tr pA > 1 - e 

implies 




We will also need a lemma from classical information theory which bounds the 
conditional entropy of two random variables with the same support in terms of the 
probability they are different. 

Lemma 5 (Fano's inequality). Let M ,M be Ai-valued random variables, and write 
Pe = Pr{M^M}. Then 

H{M\M) < H{Pe) + Pe\og\M\. 
Proof. See dU]. ■ 

6.3 Strong subadditivity and its consequences 

In this section, we recall an inequality which holds for any tripartite quantum system 
ABC. This inequality goes by the name strong subadditivity, and was originally 
proved in [22], stating that 

H{AB) + H{BC)>H{B) + H{ABC). (6.13) 

As much has been written about the proof of strong subadditivity of quantum entropy 
(see e.g. jSHl); we will not discuss the proof of the theorem here. Rather, we will 
endeavor to show here how strong subadditivity can be used as a mathematical "ham- 
mer of Thor," enabling short and elegant proofs of many known entropy inequalities 
in quantum information theory. In fact, many of these results will turn out to be 
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equivalent to strong subadditivity, in the sense that the latter is easily derivable from 
many of them. 

Begin by subtracting H{B) + H{BC) from either side of (j6.13p to yield 

H{A\B) > H{A\BC). (6.14) 

This inequality can be interpreted as a demonstration that conditioning reduces en- 
tropy. Collecting the terms on a single side yields the compact formula 

I{A-B\C) > 0, 

showing that the quantum conditional mutual information is always positive. Note 
that in the classical case, these inequalities are quite simple to prove fUl- For instance, 
positivity of I{X; Z\Y) follows from positivity of mutual information which, in turn, 
is a consequence of positivity of the Kullback-Leibler distance D{P\\Q). 

Classically, it is simple to show that I{X; Z\Y) = if and only if X — Y — Z forms 
a Markov chain in that order. Necessary and sufficient conditions for saturation of 
quantum strong subadditivity were recently determined in [22 ? who showed that 
I{A;C\B) = if and only if 

where B = (BxB^B^ . In other words, if and only if there is a local measurement that 
can be performed on B which determines x without disturbing the global state. Con- 
ditioned on knowing x, the global system is in a product state. Such a measurement 
is commonly known "which path measurement." 
Recall the definition of coherent information as 



IM)B) = -H{A\B). 
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Simply reexpressing ()6.14j) in terms of coherent information yields the inequality 

Ic{A)BC)>lM)C), (6.15) 

which can be interpreted in light of the quantum capacity theorem as saying that 
losing access to part of the output of a quantum channel can only decrease capacity. 
Observe that a similar property is obeyed by the classical mutual information, namely 
that 

I{X;YZ)>I{X-Y). 

More generally, coherent information can be shown to obey an analog of the classical 
data processing inequality (see e.g. [101), which says that if X — Y — Z is a. Markov 
chain, then 

I{X-Y)>I{X;Z). 

A quantum version of the data processing inequality |121 can be proved easily from 
strong subadditivity. 

Lemma 6 (Quantum data processing inequality). Let a bipartite density matrix 
p^^ and a channel Af : B ^ C he given. Then 

lM)B),>h{A)C)M(p). 
Proof. Choose any isometric extension Uj^f. B —>■ CE of A/". Then 

IciA)B), = lM)CE)u^^,) 

where the first step is because isometries preserve entropy, while the second is by 

dnmi). ■ 

It is thus apparent that post-processing of B can never increase coherence with 
A. It is also possible to derive strong subadditivity from data processing, by taking 
Af = Trf7, so the data processing inequality is another equivalent way to express 
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strong subadditivity. 

The quantum data processing inequality can be used to derive a more direct 
analog j2] of the classical data processing inequality, dealing with quantum mutual 
information rather than coherent information. A simple corollary of Lemma El is 

Corollary (Quantum mutual information data processing inequality). With 
the same conditions as in Lemma\^ 

I{A-B),>I{A-C)Mip). 

Proof. The conclusion of Lemma IHl can be rewritten in terms of conditional entropies 
as 

-H{A\B)^ > -H{A\C)M(,y 

Adding H{A) to each side yields the required inequality. ■ 

As a simple consequence of this corollary, we obtain a completely elementary proof 
of the Holevo bound [21], an essential step in the converse part of the HSW capacity 
theorem. 

Lemma 7 (Holevo bound). Let a cq state 

p^s = J]p(a;)|a;)(x|^®pf 

X 

be given. For any measurement on B with POVM {Ky}y^y, the following inequality 
holds. 

I{X-B)p>I{X-Y). 

Proof. Construct a measuring instrument A/": B (as in Section Ti. 3. 8|) . acting as 

y 

Application of the previous version of the data processing inequality proves the result. 
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The following inequality from [31] will be useful in Section 19. 2[ where we give a 
proof that degradable quantum channels have single-letter quantum capacities. 

Lemma 8 (Joint subadditivity of conditional entropy). For any quadripartite 
state on ABCD, the following entropy inequality applies 

H{AB\CD)<H{A\C) + H{B\D). (6.16) 

Proof. Using the original formulation of strong subadditivity ()6.13|) . we may write 
the following two inequalities: 

H{ABCD) + H{C) < H{AC) + H{BCD) 

H{BCD) < H{BD) + H{CD)-H{D). 

Combining these gives 

H{ABCD) + H{C) < H{AC) + H{BD) + H{CD) - H{D). 

Rearranging terms gives the required result. ■ 

Observe that this lemma can equivalently be expressed in terms of coherent in- 
formation as 

h{A)C) + Ic{B)D)<h{AB)CD). (6.17) 
Note that if ()6.17p is computed on a state of the form 



it follows that 

h{B )D) = H{D) - H{BD) = H{D) - H{D) = 

and 



I,{AB)CD) = H{CD) - H{ABCD) = H{CD) - H{ACD) = I^{A)CD), 



CHAPTER 6. SUPPLEMENTARY RESULTS 



82 



implying that 

lM)C)<IciA)CD), 

which is just the original strong subadditivity inequality we started with. So we 
see that strong subadditivity is equivalent to Lemma |Hl as well as to the fact that 
coherent information is superadditive. 



Chapter 7 

Entanglement generation capacities 



As a first step towards proving the theorems stated in Chapter we introduce a less 
restrictive communication scenario, entanglement generation. While the criterion of 
strong subspace transmission is analogous to a classical requirement that the maximal 
error probability be small, the entanglement generation criterion will rather be related 
to an average error constraint on good codes. 

classical-quantum scenario Alice sends classical information to Charlie at rate 
R, while Bob sends quantum information at rate Q. Rather than being required to 
transmit half of any quantum state Bob is presented with. Bob will only need to create 
near maximal quantum correlations with Charlie at rate Q. To this end. Bob begins 
by preparing a bipartite pure state [T)-^^'", entangled between a physical system B 
located in his laboratory, and the i?'" part of the inputs of A/"®". 

At the same time, Charlie will only need to identify Alice's classical message with 
a low average error probability, averaged over all of Alice's classical messages. As with 
strong subspace transmission, Charlie's post-processing procedure will be modeled by 
a quantum instrument. While the outer bound provided by our converse theorem will 
apply to any decoding modeled by an instrument, the achievability proof will require 
a less general approach, consisting of the following steps. 

In order to ascertain Alice's message M, Charlie first performs some measurement 
on C", whose statistics are given by a POVM {Am}mG2"«- We let the result of that 
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measurement be denoted M, his declaration of the message sent by Ahce. Based 
on the result of that measurement, he will perform one of 2"^ decoding operations 
V^^ : B. These two steps can be mathematically combined to define a quantum 

instrument T>: C" — > MB with (trace-reducing) components 

P^: r ©^ (a/A^ta/A^). 

The instrument acts as 

X>: r I— |m)(m|^ ® Vmir), 

m=l 

and induces the trace preserving map V : C" B, acting according to 

m=l 

We again remark that this is the most general decoding procedure required of Charlie. 
Any situation in which he were to iterate the above steps by measuring, manipulating, 
measuring again, and so on, is asymptotically just as good as a single instance of the 
above mentioned protocol. This is because the inner and outer bounds provided by 
the coding theorem and converse coincide. ({0m}mG2"«) 

T^^'",X>) will be called an 
{R, Q,n,e) cq entanglement generation code for the channel A/" if 

2nR 

2-nR J2 P';^{m, T) > 1 - e, (7.1) 

m=l 

where 

P,^§(m, T) = F (|m)|<l>)^^, X> o Ar®"(0f ' ® t^^'")) . (7.2) 

We will say that (i?, Q) is an achievable cq rate pair for entanglement generation if 
there exists a sequence of (i?, Q, n, e„) cq entanglement generation codes with e„ 0. 
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The capacity region CQcg{Af) is defined to be the closure of the collection of all 
achievable cq rate pairs for entanglement generation. 

quantum- quantum scenario As above, Alice and Bob are no longer required to 
transmit arbitrary quantum correlations with which they are presented. Rather, each 
has the goal of creating near-maximal entanglement with Charlie. For encoding, Alice 
and Bob respectively prepare the states ITi)"^^'" and \T2)^^'", entangled with the 
A'"' and 5'" parts of the inputs of A/"®"". Their goal is to do this in such a way so 
that Charlie, after applying a suitable decoding operation V: C"' AB, can hold 
the AB part of a state which is close to |$i)^^|$2)'^'^- Formally, (T^^'", Tf^",V) is 
an [R, Q, n, e) qq entanglement generation code for the channel A/" if 

F($i®$2,^?oAr®"(Ti®T2)) > 1-e. (7.3) 

{R, Q) is an achievable qq rate pair for entanglement generation if there is a sequence 
of {R, Q, n, en) qq entanglement generation codes with e„ — 0. The capacity region 
Qcg{J\f) is the closure of the collection of all such achievable rates. 

7.1 The coding theorems 

For any quantum multiple access channel A/": A'B' C, we first prove that the 
single- letter regions CQ^^\JV) and Q^^\Af), defined as the restrictions to /c = 1 of 
the respective characterizations from Sections 15. 31 and 15 .4t are respectively contained 
in CQeg{J\f) and in Qeg{J\f). It will then follow that 

OO _ CXD _ 

U -CQ(i)(Ar®^) C CQeg(Ar) and |J -Q^^HA/"®') ^ Qc^i-^) 

k=l k=l 

by applying the coding theorems to extensions A/'®'^ of A/". 

Proof of Theorem 1 (coding theorem). Our method of proof for the coding theorem 
will work as follows. We will employ random HSW codes and random entanglement 
generation codes to ensure that the average state at the input of A/"®" is close to 
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a product state. Each sender will utilize a code designed for the product channel 
induced by the other's random input, whereby existing coding theorems for product 
channels will be invoked. The quantum code used will be one which achieves the 
capacity of a modified channel, in which the classical input is copied, without error, 
to the output of the channel. As the random HSW codes will exactly induce a 
product state input, the existence of these quantum codes will follow directly from 
Proposition 121 

The random HSW codes will be those which exist for product channels. As random 
entanglement generation codes exist with average code density matrix arbitrarily close 
to a product state, this will ensure that the resulting output states are distinguishable 
with high probability. Furthermore, obtaining the classical information will be shown 
to cause but a small disturbance in the overall joint quantum state of the system. 
As we will show, it is possible to mimic the channel for which the quantum code 
is designed by placing the identities of the estimated classical message states into 
registers appended to the outputs of each channel in the product. 

The decoder for the modified channel will then be shown to define a quantum 
instrument which satisfies the success condition for a cq entanglement transmission 
code, on average. This feature will then be used to infer the existence of a particular, 
deterministic code which meets the same requirement. 

Fix a pure state ensemble {p{x), \(j)x)^'} and a bipartite pure state |\E')-^-^' which 
give rise to the cq state 

^XBC ^ ^ {l"" ^ Af) {(f^f ® *^^'), 

which has the form of ()5.1|) . Define pf = '^rcPix)(j)x and = Ttb^- We will 
demonstrate the achievability of the corner point {I{X\C),Ic{B)CX))^ by showing 
that for every e, 5 > 0, if i? = /(X; C)^ — 5 and Q = Ic{B )CX)^ — 5, there exists 
an {R,Q,n,e) cq entanglement generation code for the channel A/", provided that n 
is sufficiently large. The rest of the region will follow by timesharing. 

For encoding, Alice will choose 2"-^ sequences C = {Ar" (m)}„g2"«; i.i.d. according 
to the product distribution = nr=i^'(^*)- each sequence corresponds to a 
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preparation of channel inputs 10™)"^'" = |0Xi(m))®' ■ ■'^\'Px„{m)) , the expected average 
density operator associated with Ahce's input to the channel is precisely 

2nR 

Ec2-"^^|0^)(0„| = J]p(x")|0,..)(</),n| =pf". 

m=l x" 

Define a new channel Af' : B' ^ CX (which is also an instrument) by 

J\f': ^p(x)A/'(0^. ® p)® \x){x\^ , 

X 

This can be interpreted as a channel which reveals the identity of Alice's input state to 
Charlie, with the added assumption that Alice chooses her inputs at random. Alterna- 
tively, one can view this as a channel with state information available to the receiver, 
where nature is randomly choosing the "state" x at Alice's input. By Proposition 
there exists a {Q,n,e) random entanglement generation code {qp, [T^)"^^'" ,T)^} for 
the channel A/"' with average code density operator g^'" = ^ „ Ti^ satisfying 



In what follows, we will use the shorthand |T) for the random vector which takes the 
value IT'^) with probability g/j. We further abbreviate 

P 

where Ai is any function of the random vector T. 

Now, by Proposition^ for the channel A/i : p ^ A/'(p ® P2) which would result if 
Bob's average code density operator were exactly equal to pf", there exists a decod- 
ing POVM {Am}^g2"« which would identify Alice's index m with expected average 
probability of error less than e, in the sense that 

2'n.R 

Ec2-"^^TrA„r;>l-e, 

m=l 
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where 

r;, = Ar«"(0„®pf"). 
By the symmetry of the random code construction, we utihze (|4.3p to write this as 

EcTr Air( > 1 - e. 

Define the actual output of the channel corresponding to M = m as 

r^=AA«'^(0„®Tr^T), 

as well as its extension 

where |$)^^ is the maximally entangled state which Bob is required to transmit. 
Note that 

Efs Tm = TiB U = Ar®"(0„ ® g). 
It follows from monotonicity of trace distance that 

lE/jTi - r(|^ < e, 

which, together with Lemma d implies that 

2nR 

Ec 2-^^^ J2 Am = ^cfs Tr Am > 1 - 2e. 

m=l 

This allows us to bound the expected probability of correctly decoding Alice's message 

as 

Ec^Tr(l®Ai)ei > l-2e. (7.4) 

In order to decode, Charlie begins by performing the measurement {Am}^g2"«- 
He declares Alice's message to be M = m if measurement result m is obtained. 
Charlie will then attempt to simulate the channel jV"'®", by associating a separate 
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classical register Xi to each channel Af : A'- ^ Ci in the product, preparing the states 
\Xi{m))^\ for each 1 < i < n. Additionally, he stores the result of the measurement 
in the system M, his declaration of the message intended by Alice. This procedure 
results in the global state 

^BC-x-M = J2{l0 VA^) 6 (l ® VA^) ® \X-{m)){X^m)r ® \m){m\''. 

m=l 

Let ©-SC'X" _ rjn^_^ -p j£ Qj^g^riig ^g^g qIqIq perfectly reconstruct Alice's classical 
message, F would instead be 

r = ^1 ® |x"d))(x"(i)i^" ® 

with G' = TrjjF'. When averaged over Alice's random choice of HSW code, 6' is 
precisely equal to the state which would arise via the action of the modified channel 
A/"'. This is because 

= AA'®"(T), (7.5) 
where we have written the joint state which results when Alice prepares 0a;" as 

However, our choice of a good HSW code ensures that he can almost perfectly recon- 
struct Alice's message. A consequence of this will be that the two states 6 and 6' 
are almost the same, as we will now demonstrate. 

In what follows, we will need to explicitly keep track of the randomness in our 
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codes, by means of superscripts which are to be interpreted as indexing the deter- 
ministic codes which occur with the probabihties pc and q^. Rewriting (j7.4j) as 

Cl3 



it is clear that we may write 



Tr(l®A?)ef^> l-ec/3, 



for positive numbers {ec/?} chosen to satisfy 



"^Pcqfsecp = 2e. 

Cl3 



By the gentle measurement lemma, 



and thus, by the concavity of the square root function. 



Ec/3 1 (i ® v^) 6 (i ® \/a^) - e 



Cf3 

< 4v^. 
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Along with ()7.4|) and monotonicity with respect to Tt^, this estimate lets us write 



Cfs I (l ® v^) 6 (l ® V^) 
a;) 6 (l ® 



m.=2 
1 



6 
6 



(7.6) 



+ Ec/3 5^Tr(l®Ajei 

m=2 

< 4v^+2e 

< 5v^, 



(7.7) 



provided that e < |- Since the the entanglement fidelity is linear in T>{Q), which is 
itself linear in 0, we can also use the special triangle inequality to write 

Fm,V{Ecfse)) = F(|$),EaP(Ece)) 

> F(|<l>),E;3^^(Ec0')) - |E^P(Ece')-E^P(Ec0)|^. 

Using our earlier observation from ()7.5|) and the definition of a {Q, n, e) entanglement 
transmission code, we can bound the first term as 

F(|$),P(Ece')) = F(|$),PoA^'®"oT) 
> 1 -e. 

An estimate on the second term is obtained via 

\EpV{EcQ)-EpV{EcQ% < Efs\V{Ec Q) - V{Ec Q% 

< E^lEcO-Ece'l, 

< Ec;3|0-e'|, 
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where first three hnes are by convexity, monotonicity, and convexity once again of the 
trace norm. The last inequality follows from (|7.7p . Putting these together gives 

Ec^F(|$),P(0)) > l-e-5V~e 

> l-6v^. (7.8) 

At last, observe that the final decoded state Q (which still depends on both sources 
of randomness C and /?) is equal to 



implicitly defining the desired decoding instrument T> : C" BM. The expectation 
of (j7.ip can now be bounded as 



2 



nR 



Ec;3 2-"^5^Pf(m) = ¥.cpP:^{l) 

= F(|l)|$),Ec/3f^) 



m=l 



> 1 - iTr^^Ec^r - |1)(1||^ - 3(1 - FmMQ))) 

> l-2v/2^-18v^ 

> l-21v^. 

The third line above is by Lemma |21 The first estimate in the fourth line follows 
from (j7.4j) . while the second estimate is by (j7.8|) . together with (|6.3p . We may now 
conclude that there are particular values of the randomness indices f3 and C such 
that the same bound is satisfied for a deterministic code. We have thus proven that 
({0m}m62"«) ^-j ^) comprises a {R,Q,n,21^/e) entanglement generation code. This 
concludes the coding theorem. ■ 
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Proof of Theorem 2 (coding theorem). Begin by fixing bipartite pure states l^^i)"^"^' 
and which give rise to the state 

and defining = Tta ^1, = Ttb^2- Letting e, 5 > be arbitrary, we will show 
that there exists a {QaiQbin^e) qq entanglement transmission code where 

Qa = IM" )C)^ - 6 and Q, = h{B" )A"C)^ - 6 

provided that QaiQb ^ 0. Note that the rates in Theorem 2 will be implied by taking 
the channel to be Af^^, with u^^'"'' defined similarly. 

Let us begin by choosing an isometric extension U_\f : A'B' CE of M . Define 
the ideal channel A/i : A' ^ C which would effectively be seen by Alice were Bob's 
average code density operator exactly equal to p®^ as 

We now use to define a particular isometric extension lAj^^: A' —>■ CE' of A/i, 
where E' = B"E, as 

Km,: t ^Um{t ®^2)- 

Observe that Bob's fake input B" is treated as part of the environment of Alice's 
ideal induced channel. We then further define the channel A/2 : B' A"C by 

M2: T ^ M{<i/i(g)T). 

In contrast to the interpretation of A/i, this may be viewed as the channel which 
would be seen by Bob if Alice were to input the A' part of the purification {'^2)^"'^' 
of P2' to her input of the channel and then send the A" system to Charlie via a 
noiseless quantum channel. As in the proof of Theorem 1, Charlie will first decode 
Alice's information, after which he will attempt to simulate the channel N'2, allowing 
a higher transmission rate for Bob than if Alice's information was treated as noise. 
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Since quantum information cannot be copied, showing that this is indeed possible 
will require different techniques than were utilized in the previous coding theorem. 
Although ensembles of random codes will be used in this proof, we introduce the 
technique of coherent coding, in which we pretend that the common randomness is 
purified. The main advantage of this approach will be that working with states in the 
enlarged Hilbert space allows monotonicity to be easily exploited in order to provide 
the estimates we require. Additionally, before we derandomize at the end of the 
proof, it will ultimately be only Bob who is using a random code. Alice will be able 
to use any deterministic code from her random ensemble, as Charlie will implement a 
decoding procedure which produces a global state which is close to that which would 
have been created had Alice coded with the coherent randomness. To show this, we 
will first analyze the state which would result if both senders used their full ensembles 
of codes. Then we show that if Alice uses any code from her ensemble, Charlie can 
create the proper global state himself, allowing him to effectively simulate A/2 and 
ultimately decode both states at the desired rates. 

By Proposition!^ for large enough n, there exists a (Qa, e) random entanglement 
generation code {pe, \T{)^^'" ^Vf) for the channel A/i, where Qa = /c(pi,A/i) — 6 = 
Ic{A" )C)—6. There similarly exists a (Qfe, n, e) random entanglement generation code 



{qm, IT™)^^'",©™) forTVa, with Q;, = Icip2.^f2)-S = h{B")A"C)-5. Proposition!! 



further guarantees that these codes can be chosen so that their respective average code 
density operators 




^p^Tr^Ti and 



^ qrn TiB T2 



m 



satisfy 




< e. 



(7.9) 



Recall that by Proposition !2] we may choose isometric extensions : C 



AF 
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implementing the V[ from Alice's random code which satisfy 

F(|$i)'^^|Ar^'",<oW^«(T^j) >l-e (7.10) 

for every random code index £ and the same fixed state |A)'^^'". 

Let the code common randomness between Alice and Charlie be held between the 
systems La and Lc, represented by the state 



Lc 



defining a similar state ^^^^^^c- f^j, Bob-Charlie common randomness. For conve- 
nience, let us further pretend that 71 is part of a pure state 



LE\(/\LA\e\Lc 



Similarly, let 72 by purified by |r2)^^^^^*^'^. Write controlled encoding isometries 
£1: La-* LaA"" and £2: Mb ^ MbB"" as 

^i = El^)l^i)(^l £2 = J]|m)|T-)(m|. 

(. m 

The states which would arise if Alice and Bob each encoded coherently are 

m 

Note that we have abbreviated L = L^LaLq and M = MeMbMc- As each |Tj) is 
a purification of Qi, together with ()7.9|) . Uhlmann's theorem tells us that there exist 
unitaries Vi: LA ^ A"" and V2 : MB 5"" such that 

F(V-,|T,),|M/,)«") >l-e. (7.11) 
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Further define a corresponding controlled isometric decoder : LqC^ LqAF for 
Alice's code as 

I 

Let us now imagine that each of Alice and Bob encodes using the coherent common 
randomness, resulting in a joint pure state W^"|Ti)|T2) on LAMBC^E"-. If Charlie 
then applies the full controlled decoder from Alice's code, the resulting global pure 
state would be 

^QS^LAAMBFE- ^^^^ O | Ti) | T2) . 

For each i, let us define an isometry : i?'" AAFE'^ as 
which we use to define the pure states 
These definitions allow us to express 

e 

Further writing I^x'^-^A'^-B-^" = V2~^| A)"^^""^", the following bound applies 

= f(|<i.i)|a)^^""^", 0^0 1/21x2) 



> 1 - 2V1 - F (|<l>i)|A)^^""-^", 0^|^2)®")) 



-2v/l-F(V^2|T2),|^2)^") 



> 1-2^1- F{\^,)\\y^'\U^^oUfr^o\T[))-2^e 
Above, the second equality is because the actions of and V2 commute, the first 
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inequality is by the triangle inequality and monotonicity with respect to O^, while 
for the second inequality, we have just rewritten the first term and used (|7.11|) for 
the second. The last bound is from (j7.1(J|) . Observe that we are still free to specify 
the global phases of the outputs of the so that the above bound further implies 
{ei\\^i)\X') > (1 - 4v/?)i/2 for each i. Consequently, 



F(|e),|r,)|<|.,)|A')) 



J2v^{m'){0i\\^i)\\') 

= J2p^{e,\\<^,)\x') 

> l-4v^. 



Essentially, the subsystems L, AA and MBFE"' of |G) are mutually decoupled. 

As mentioned earlier, it will be sufficient for Alice to use any deterministic code 
from the random ensemble to encode. Without loss of generality, we assume that 
Alice chooses to use the first code {i = 1) in her ensemble. Bob, on the other hand, 
will need to use randomness to ensure that Alice's effective channel is close to a 
product channel. The state on AMBC^E^ which results from these encodings is 

We will now describe a procedure by which Charlie first decodes Alice's informa- 
tion, then produces a global state which is close to |0), making it look like Alice had 
in fact utilized the coherent coding procedure. This will allow Charlie to apply local 
unitaries to effectively simulate the channel A/2 for which Bob's random code was de- 
signed, enabling him to decode Bob's information as well. These steps will constitute 
Charlie's decoding V : MqC"^ — > MqAB, which depends on the Bob-Charlie common 
randomness. The existence of a deterministic decoder will then be inferred. 

Charlie first applies the isometric decoder U^_^ , placing all systems into the state 
16*1). He then removes his local system A (it is important that he keep A in a safe place, 
as it represents the decoder output for Alice's quantum information) and replaces it 
with the corresponding parts of the locally prepared pure state l^i)"^"^". Charlie also 
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locally prepares the state iFi)^. The resulting state 
satisfies 

F(e',e) > i-|Tr^^^i-y|^-|y-Tr^^^0|^ 

-3(l-F(|r)|$i),TrA/BFi^ne)) 

> 1 -2y^V^-2y^- 12v^ 

> l-9e^/^ (7.12) 

whenever e < 12~^. The first line combines Lemma 2 and the triangle inequality. The 
first two estimates in the second line are from applying (jfi.3|l and monotonicity with 
respect to Tr^j^ and Tr^^^j^ to the previous two estimates. The last estimate in that 
line is from monotonicity with respect to the map Ti mbfe" applied to the previous 
estimate. Next, Charlie will apply Vi o to 0' ^ in order to simulate the channel 
7V2. To see that this will work, define M : LAAFE'' as = Tr^n Vi oU^^ 

and observe that by monotonicity with respect to ■ ® T2) and ()7.11|) . the states 

on MBA'^^C satisfy 

F(A^(e),Ar|^"(T2)) = F(l^ioAr«"(Ti®T2),Ar®"(*f"®T2)) 

> F(Vi|Ti),|M/i)«") 

> 1 -e. 

We may now use the triangle inequality and monotonicity with respect to M. to 
^This operation only acts on Charlie's local systems, i.e. Vi oU^^: LA°A°F A""C"^. 
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combine our last two estimates, yielding 

F(7W(0O,Ar2®"(T2)) > 1 - 2y/l - F {M{Q'),M{Q)) 

-2^1 -F[M{Q), Mr 0^2)) 

> 1 - 2V9ei/4 _ 2Ve 

> l-7e^/* (7.13) 

whenever e < 2~®/^. We have thus far shown that Charlie's decoding procedure suc- 
ceeds in simulating the channel while simultaneously recovering Alice's quantum 
information. Charlie now uses the controlled decoder T>2 ■ McA""'C^'' — > McB defined 

as 

m 

to decode Bob's quantum information. This entire procedure has defined our decoder 
T> : McC^ — ^ McAB which gives rise to a global state ^l^^^^^ representing the final 
output state of the protocol, averaged over Bob's common randomness. This state 
satisfies 

F(|$i),Tr^gf]) > F(e,e') 

because of monotonicity with respect to Ttlmbfe^ applied to the bound ()7.12|) . By 
using the triangle inequality, the fact that Bob's codes are e-good for each m, and 
monotonicity of the estimate (j7.13p with respect to TtmT>2, the global state can 
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further be seen to obey 

F(|$2),Tr^^f]) = F(|$2),TrMP2oA^(0')) 



> l-2Jl-F(|<l>2),TrM2^2oArr(T2)) 



2 J 1 - F ( Tr M P2 o AT^®" (T2) , TrM ^2 o (90 ) 



> 1 -2v^-2V7eV8 

> l-Te^/i^ 

as long as e < 2^^^/''. Along with (j6.3|) . a final application of Lemma 2 combines the 
above two bounds to give 

F(|$i)|$2),fi) > l-|$i-Tr^gl]|^-3(l-F(|$2),Tr^;ifi)) 

> 1 - 2V9eV4 _ 21e^/^^ 

> l-22e^/^^ 

provided that e < 6^^^. Since this estimate represents an average over Bob's common 
randomness, there must exist a particular value m* of the common randomness so 
that the corresponding deterministic code is at least as good as the random one, thus 
concluding the coding theorem. ■ 



7.2 The converse theorems 

We will now demonstrate that 

00 00 ^ 

CQeg(Ar) C IJ -CQ^'^Af^') and Qeg(Ar) C |J -Q(i)(Ar^^), 

k=l k=l 

where the single-letter regions CQ^^\j\f) and Q^^\j\f) are those defined at the begin- 
ning of the last section. 

Proof of Theorem 1 (converse). Suppose there exists a sequence of {R,Q,n,en) en- 
tanglement generation codes with e„ 0. Fixing a blocklength n, let {0m}, T^^'", T> 
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comprise the corresponding cq entanglement generation code. The state induced by 
the encoding is 



, ,MBC 



" = 2-"« J] |m)(m|^ ® (1^ ® Ar®")(0^ ® T). 



m=l 



After apphcation of the decoding instrument X> : BM, this state becomes 

An upper bound on the classical rate of the code can be obtained as follows: 

nR = H{M)n 

= I{M-M)n + H{M\M)n 

< I{M;M)n + H{en)+nRen 

< I{M-C^)^ + nel 

The first inequality follows from Fano's inequality (Lemma |3J while in the second we 
use the Holevo bound (Lemma [7j) and define e'^ ~ „ ~'~ ^^n- The quantum rate of the 
code is upper bounded as 

h{B)C''M)^ > h{B)BM)n 

> IciB)B)n 

> h{B)B)^-2H{tn)-MQ^ 
= nQ — ne". 

Above, the first two inequalities are consequences of the data processing inequality 
(Lemma inj, while the last inequality applies a combination of Lemma El and ()6.3|) . 
along with the definition e'^ = - + nQ^/e^. Setting X = M, we have thus proven that 



R<-I{X;C^) + e'^, Q<-h{B)C^^X) + e: 
n n 
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whenever {R, Q) is an achievable cq rate pair for entanglement generation, where 
e'„, e'^ — > 0. It follows that for any achievable rate pair {R, Q) and any 6 > 0, we have 

{R-6,Q-6)e -CQ^^HA/"®") ^ CQ{Af). 
n 

Since CQ{N') is closed by definition, this completes the proof. ■ 

Proof of Theorem 2 (converse). Suppose that {Qa,Qb) is an achievable qq rate pair 
for entanglement generation. By definition, this means that there must exist a se- 
quence of {Qa, Qb, n, e„) entanglement generation codes with e„ 0. Fixing a block- 
length n, let |Ti)^^'", ITa)^^'" and P: C" ^ AB comprise the corresponding encod- 
ings and deco dings. Define 

^ABc- = (1^^ ® Ar«")(Ti ® T2) 

to be the result of sending the respective A'"^ and B'^ parts of Ti and T2 through the 
channel A/"®". Further defining 

as the corresponding state after decoding, the entanglement fidelity of the code is 
given by 

Fab = ® |$2), > 1 - e„. (7.14) 

where l^i)"^"^ and |$2)^^ are the maximally entangled target states. The sum rate 
can be bounded as 

IMB)C^)^ > lMB)AB)n 

> 1MB )AB)^,r^^, - 2H{e^) - 8n{Q, + Q,)^ 

> n{Qa + Qb) - ne'^. 



The first step is by the data processing inequahty (Lemma IH))- The second step uses 
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Lemma El and ()6.3p , along with monotonicity applied to (j7. 14|) . The last step has 
defined = | — 8{Qa + Qb)\^ and holds because the binary entropy H{-) is upper 
bounded by 1. We can bound Alice's rate Qa by writing 

Je(A)5C")^ > IM)C^)^ 

> lM)AB)n 

> Ic{A)A)n 

> I,{A )A)^, - 2H{tn) - SnQaV^ 

> nQa - ne'^. 

The first three steps above are by data processing (Lemma EI). The remaining steps 
hold for the same reasons as in the previous chain of inequalities. Similarly, Bob's 
rate also must satisfy 

nQ,<h{B)AC'')^ + nel 

Since e„ — > implies ^ 0, this means that for every 5 > 0, any achievable qq rate 
pair (Qa, Q^) must satisfy 

{Qa -6,Q,-6)e -Q(i)(A^''") ^ Q{Af). 
Since Q{Af) is closed by definition, this completes the proof. ■ 



Chapter 8 

Transmission of quantum 
information 

In the previous chapter, we have proven the main theorems for the restricted case 
in which all quantum communication has been in the sense of generating quantum 
correlations between senders and receiver. The results of this chapter will complete 
the proofs of the main theorems, by extending the weaker error criteria of entangle- 
ment generation (which incidentally, are analogous to a classical requirement on the 
average probability of error) to the stronger requirements of strong subspace trans- 
mission in the main theorem statements. As a first step, we demonstrate how the 
results of the last chapter immediately imply the ability to perform an intermediate 
task, entanglement transmission, where the senders are required to transmit preex- 
isting maximal entanglement, while still adhering to an average error criterion on the 
classical error. We then show how to use a given entanglement transmission code to 
construct a strong subspace transmission codes achieving any rates less then those of 
the original code, while paying a negligible price in fidelity. 

8.1 Entanglement transmission 

Classical-quantum scenario In this scenario, rather than generating entangle- 
ment with Charlie, Bob will act to transmit preexisting entanglement to him. We 
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assume that Bob is presented with the B part of the maximally entangled state 
1$)^'^. It is assumed that he has complete control over B, while he has no access to 
B. He will perform a physical operation in order to transfer the quantum information 
embodied in his system B to the inputs B'^ of the channel, modeled by an encoding 
operation S: B ^ B'^. The goal of this encoding will be to make it possible for Char- 
lie, via post-processing of the information embodied in the system C", to hold the B 
part of a state which is close to that which would have resulted if Bob had sent his 
system through a perfect quantum channel id: B ^ B. Here, we imagine that B and 
B denote two distinct physical systems with the same number of quantum degrees 
of freedom. The role of the identity channel is to set up a unitary correspondence, 
or isomorphism, between the degrees of freedom of B in Bob's laboratory and those 
of B in Charlie's. We will often tacitly assume that such an identity map has been 
specified ahead of time in order to judge how successful an imperfect quantum trans- 
mission has been. This convention will be taken for granted many times throughout 
the paper, wherein specification of an arbitrary state |\E')^'^ will immediately imply 
specification of the state = (1^ id)|\l/)'^'^. Decoding is the same as it is for 

entanglement generation. 

({0m}me2"«; ^! ^) will be called an (i?, Q, n, e) cq entanglement transmission code 
for the channel N" if 

2-"^ J]Pf (m) > 1 -e, (8.1) 

■m=l 

where 

Pf (m) = F (|m) 1$)^^, T> o A/'®"(0;^'" ® ^($^^)) . (8.2) 

Achievable rate pairs and the capacity region CQct{Af) are defined analogous to those 
for entanglement generation. 

Quantum-quantum scenario Alice and Bob each respectively have control over 
the A and B parts of the separate maximally entangled states l^i)"^"^, \^2)^^ , while 
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neither has access to A or 5. Ahce transfers the correlations in her system to the 
A'" parts of the inputs of A/"®" with an encoding operation Si: A ^ A'"-. Bob acts 
similarly with £2: B ^ B'". Their goal is to preserve the respective correlations, 
so that Charlie can apply a decoding operation V: C" AB, in order to end up 
holding the AB part of a state which is close to |$i)"^"^|$2)^^- Formally, {Si,S2,T>) 
is a [Qa, Qbi e) qq entanglement transmission code for the channel M if 

F(|$i)|$2),I^oAr®"o(^i®^2)($i®$2)) > 1-e. (8.3) 

Achievable qq rate pairs for entanglement generation and the capacity region Qct(A/') 
are defined as in the previous scenario. 

8.2 Equivalence of entanglement transmission and 
entanglement generation 

8.2.1 CQeg C CQet and Qeg C Qet 

Proof. This essentially follows as an artifact of the entanglement generation coding 
theorem from ^^l- There, the input preparation |T)"^"^'" for a {Q,n) entanglement 
generation code is constructed with the particular form 

^/OnQ ^ — ^ 
^ a62"<3 

where the {|0a)} are orthogonal. Observe that the if the encoder acts on the A part 
of the maximally entangled state 



'2'nQ 

ae2" 
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with an encoding isometry S : A ^ A'"^ defined via 

the identity ^1$)"^"^ = |T)"^^'" holds trivially. It is thus a simple task to modify the 
proofs of Chapter [7| to instead prove the existence of the entanglement transmission 
codes described in the previous section. Indeed, if (|T), {0^)7 is a {R,Q,n,e) cq 
entanglement generation code, there then exists an encoder S so that {S, {4>m}, ^) is 
a {R,Q,n,e) cq entanglement transmission code. Identical reasoning shows that to 
every qq entanglement generation code, there a qq entanglement transmission code 
with the same parameters. ■ 



8.2.2 CQet C CQeg 

Proof. Suppose there exists an {R,Q,n,e) cq entanglement transmission code, con- 
sisting of classical message states {|0m.)^'"}m,G2"«5 a quantum encoding map 8: B ^ 
B, and a decoding instrument T>: MB. Write any pure state decomposition 

of the encoded state 

(1^®£:)($) = J]p,|T,)(T,|. 

i 

Then, the success condition (|8.ip for a cq entanglement transmission code can be 
rewritten as 



2 



nR 



l_e < 2-"^EPf(m) (8.4) 

m=l 

= 2-'^^5^F(|$)^^,I?„,oAr«"(0;^'"®(5^p,T,))) (8.5) 

m=l i 

= EPM2""''E^(l*)''^'^-°^^'^('^r®T,)) j (8.6) 

i \ m=l J 

/ 2"« \ 



m=l 
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so that there is a particular value i* of i for which 

2-"^J]Pf(m,T,.))>l-6. 

m=l 

Hence, ({|0m)}me2"-R) l^j*); ^) comprises an {R,Q,n,e) cq entanglement generation 
code. ■ 

8.2.3 Qet ^ Qeg 

Proof. Suppose there exists a {Qa, Qb, n, e) entanglement transmission code {Si, S2, V) 
which transmits the maximally entangled states |$2)- As in the cq case, the 
encoded states can be decomposed as 



i 

and 

(l^®£2)($2) = 5ZgjT2.. 

i 

The reliability condition (|8.3p can then be rewritten as 

Y,Piq^F{\^i)\^2).V®M^^{Tu ® T2,)) > 1 - e, 

which implies the existence of a particular pair of values of such that 

F(|$i)|$2),P ® Ar®"(Ti,. ® T2,0) > 1 - e. 
Hence, (|Tij*), \T2j*),'D) comprises a {Qa,Qb,n,^) entanglement generation code. 
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8.3 Strong subspace transmission revisited 

The criteria of entanglement generation and transmission, both in the cq and qq 
cases, are directly analogous to the requirement in classical information theory that 
the average probability of error, averaged over all codewords, be small. However, the 
requirements imposed in Section are analogous to the stronger classical condition 
that the maximal probability of error be small, or that the probability of error for each 
pair of codewords be small. There are examples of classical multiple access channels 
for which, when each encoder is a deterministic function from the set of the messages 
to the set of input symbols, the maximal error capacity region is strictly smaller than 
the average error region ^H]. However, it is known that if stochastic encoders are 
allowed (see Problem 3.2.4 in fj), the maximal and average error capacity regions 
are equal. 

It is well-known that randomization is not necessary for such an equivalence to 
hold for single-user channels, as Markov's inequality implies that a fraction of the 
codewords with the worst probability of error can be purged, while incurring a neg- 
ligible loss of rate. The obstacle to utilizing such an approach for classical multiple 
access channels, and hence for quantum ones as well, is that there is no guarantee 
that a large enough subset of bad pairs of codewords decomposes as the product of 
subsets of each sender's codewords. 

A particularly attractive feature of the requirements of Section 15.21 is that they 
ensure composability; when combined with other protocols satisfying analogous crite- 
ria, the joint protocol will satisfy similar properties. As an example, recent work on 
organizing and classifying quantum Shannon-theoretic protocols by means of resource 
inequalities makes heavy use of such concatenation of quantum information pro- 
cessing protocols. 

In the next two subsections, we cast the requirements outlined in Section I^T^ into 
somewhat simpler forms which are specific to each of the cq and qq cases. We will 
use these forms in order to prove the equivalences of entanglement transmission and 
strong subspace transmission in both the cq and qq cases. 
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8.3.1 classical-quantum scenario 

Strong subspace transmission can be considered a more ambitious version of entangle- 
ment transmission, whereby rather than requiring Bob to transmit half of a maximally 
entangled state l^)"^^, it is instead required that he faithfully transmit the B part, 
presented to him, of any bipartite pure state where \B\ can be any finite 

number. The reader should note that this constitutes a generalization of the usual 
subspace transmission jS], as whenever = \ip)^\(p)^, this amounts to requir- 

ing that \(f) be transmitted faithfully. We further demand that the maximal error 
probability for the classical messages be small. 

As with entanglement transmission, Alice will send classical information at rate 
R by preparing one of 2"^ pure states {|0m)'^'"}me2"«- As previously discussed, our 
more restrictive information transmission constraints can only be met by allowing 
Alice to employ a stochastic encoding. We assume that Alice begins by generating 
some randomness, modeled by the random variable X. To send message M = m, 
she prepares a state (t>f{m), where /(m) = fxijn) is a random encoding function, 
depending on the randomness in X. In the language of Section this amounts to 
the definition of a c — > q encoding function : M — > A'". Observe that our definition 
there already allows for randomness to be part of the encoding process. 

Bob will apply an encoding S: B ^ B'^ (this is just his encoding S2 from Sec- 
tion without a classical input), and Charlie will employ a decoding instrument 
T> : MB. These maps require a more complicated structure than was required 

for entanglement generation and transmission. Indeed, these will be constructed by 
means of a protocol, to be described below, out of the entanglement transmission 
codes which were proved to exist in Section 18.2.11 The success probability for the 
protocol, conditioned on m being sent and |\1')'^'^ being presented, can be expressed 
as 
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We will say that (/, X, {|0m)}me2"«; ^) ^) is an {R, Q, n, e) cq strong subspace trans- 
mission code for the channel Af if, for every m G 2"^ and every |\E')^^, 

ExP.(m,^) > 1 -e. (8.8) 

The rate pair [R, Q) is an achievable cq rate pair for strong subspace transmission 
if there is a sequence of {R,Q,n,en) cq random strong subspace transmission codes 
with e„ — > 0, and the capacity region CQ{N') is closure of the collection of all such 
achievable rates. 

8.3.2 quantum-quantum scenario 

This scenario is the obvious combination of the relevant concepts from the previous 
scenario and the qq entanglement transmission scenario. Alice and Bob are respec- 
tively presented with the A and B parts of some pure bipartite states and 
1^2)^^- As before, we place no restriction on |y4| and \B\, other than that they 
are finite. They employ their respective encodings £1 and £2 (which are just the 
encodings from Section without classical inputs), while Charlie decodes with V. 
As in the above cq case, the structure of these maps will be more complicated than 
in the previous two scenarios. {£i,£2,T>) is then a {Qa,Qb,n,e) qq strong subspace 
transmission code if 

F(|^i)^^|^2)'''^,^^oA/'®"o(^i®£'2)(^^^^®^f^)) > 1-e, (8.9) 

for every pair of pure bipartite states l^i)"^"^ and |\&2)^^- Achievable rates and the 
capacity region Q{Af) are defined as in the cq case. 
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8.4 Equivalence of entanglement transmission and 
strong subspace transmission 

Let us first prove the easy directions. To see that CQ C CQd, note that given a strong 
subspace transmission code, if Ahce uses any deterministic value x for her locally 
generated randomness X, the average classical error will be equal to the expected 
maximal classical error of the randomized code. Since the ability to transmit any 
state includes the maximally entangled case, this completes the claim. The inclusion 
Q ^ Qet follows trivially. As any states can be transmitted, this certainly includes 
the case of a pair of maximally entangled states. 

8.4.1 CQet C CQ 

Proof. Suppose there exists an (i?, Q, n, e^/2) entanglement transmission codes with 
classical message states {|0m)^'"}mG2"«) quantum encoding S : B ^ B, and decoding 
instrument T>: C" MB with trace-reducing components {V^ : C" B}, which 
transmits a maximally entangled state l^")"^"^, where \A\ < oo (although \A\ = 2"'^). 

We will initially prove the equivalence by constructing a code which requires two 
independent sources of shared common randomness X and Y. X is assumed to 
be available to Alice and to Charlie, while Y is available to Bob and to Charlie. 
Then, we will argue that it is possible to eliminate the dependence on the shared 
randomness, by using the channel to send a negligibly small "random seed" , which can 
be recycled to construct a code which asymptotically achieves the same performance 
as the randomized one. 

We begin by demonstrating how shared common randomness between Alice and 
Charlie allows Alice to send any message with low probability of error. Setting fi = 
2"^, let the random variable X be uniformly distributed on the set {1, . . . To 
send message M = m, Alice computes m' = m + X modulo fi. She then prepares 
the state |0m/) for transmission through the channel. Bob encodes the B part of 
1$)^^ with S, and each sends appropriately through the channel. Charlie decodes as 
usual with the instrument X>. Denoting the classical output as M', his declaration of 



CHAPTER 8. TRANSMISSION OF QUANTUM INFORMATION 



113 



Alice's message is then M = M' — X modulo fi. Defining the trace-reducing maps 
and the trace-reducing average map as 

1 

m=l 

we can rewrite the success criterion (jS.ll) for entanglement transmission as 

Fm, Mm >i-ey2, 

which, together with ()6.3|) . implies that for the identity map id : B ^ B, 

|(M -id)($)|, < e. (8.10) 

The above randomization of the classical part of the protocol can be mathematically 
expressed by replacing the Mm with Mm+x- As tracing over the common random- 
ness X is equivalent to computing the expectation with respect to X, we see that 
Ex Mm+x = M, OT rather 

Ex F{\<i>),Mm+xm = F'm,Mm- 

It is thus clear that the maximal error criterion for the randomized protocol is equal 
to the average criterion for the original one. 

We continue by randomizing the quantum part of the classically randomized pro- 
tocol. Setting d = 2"'^ = \B\, let {t^j^gdz be the collection of Weyl unitaries, or 
generalized Pauli operators, on the (i-dimensional input space. Observe that for any 



CHAPTER 8. TRANSMISSION OF QUANTUM INFORMATION 



114 



p, acting with a uniformly random choice of Weyl unitary has a completely random- 
izing effect, in the sense that 



1 

-J^UypU/ = na. 



y=l 

Let the random variable Y be uniformly distributed on {1, . . . , ci^}. It will be conve- 
nient to define the common randomness state 

y=l 

where the system Yg is in the possession of Bob, while Yq is possessed by Charlie. 
Define now the controlled unitaries Ub '■ Y^B YbB and lAc ■ YcB YcB by 

UB = J2\y){y\'^^^Uy 

y=l 

and 

Uc = Y.\y)^y\'^"®^y'- 

y=l 

Suppose Bob is given the B part of an arbitrary pure state |\E')^^, and Alice sends 
the classical message M = m. For encoding. Bob will apply S o Ub to the combined 
system T CS> ^E'. Charlie decodes with Uc o P. If were equal to the perfect quantum 
channel id: B ^ B, this procedure would result in the state 



y=l 



Note that the common randomness is still available for reuse. Abbreviating \y){y\ 



Y 
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\y){yr^\y){y\ 



, and 



(1^ (g) f/y)|^), we write 



^YBB 



(8.11) 



(8.12) 



Observe that a is an extension of the maximally mixed state tt , and can be seen 
to arise by storing in Y the result of a von Neumann measurement along the basis 



Since Tr^j/^jr = Try/j cr = vr^, |r) is maximally entangled between FB and B. So, 
there exists an isometry V : B ^ FB such that {V ® 1^)|$)^^ = |r). This implies 
that there is a quantum operation O: B ^ YB such that {O ® 1'^)(<I') = a. Define 
the trace-reducing map T: B ^ B, which represents the coded channel with common 
randomness accounted for, by 



Recalling our denotation of the noiseless quantum channel id: i? — 5, as well as our 
convention that id acts as the identity on any system which is not B, we now bound 



{\y)^}y(^d^ on the F part of the pure state 




i-Fm,'^m < 



{UcoMoUb -id){T 0-^) 
{M-id)oUB{r ®^)\ 



< 



< 



{M-id){a) 
(7W -id)($) 



< 
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where the first hne is by (j6.1|) and the second by monotonicity with respect to Try. 
The third follows from unitary invariance of the trace. The second to last inequality 
is a consequence of monotonicity with respect to O, while the last is by (|8.1(J|) . Note 
that by monotonicity, this implies that any density matrix satisfies 

\T{n)-n\i<e. (8.13) 

We have thus shown that if Alice and Charlie have access to a common randomness 
source of rate R, while Bob and Charlie can access one of rate 2Q, the conditions 
for strong subspace transmission can be satisfied. Next, we will illustrate that, by 
modifying our protocol, it is possible to reduce the amount of shared randomness 
required. Using the previous blocklength-n construction, we will concatenate such 
codes, where each utilizes the same shared randomness, to construct a new code with 
blocklength nN. For an arbitrary |vI/(^)j)-S-S^^ further define the commuting operations 
{%}i&N^ where %: Bi ^ Bi is T acting on the i'th tensor factor of \1>(^). Setting 
,^0 = ^^^\ we then recursively define the density operators = Ti{^i^i), noting that 

= Tato. . .ori(eo) = r®^(^(^)). Because of dHU, = < e, 

and we can use the triangle inequality to estimate 

< 
< 

By choosing = it is clear that we have reduced Alice's and Bob's shared 
randomness rates respectively to ^/eR and 1\feQ^ while the error on the A^-blocked 
protocol is now ^Je. Next, we argue that by using two more blocks of length n, it is 
possible to simulate the shared randomness by having Alice send nR random bits X 
using the first block, while Bob locally prepares two copies of $, $^2B2^ g^^^ 

transmits the -B1-B2 parts over the channel using both blocks. Charlie decodes each 
block separately, obtaining a random variable X and the B^ and B2 parts of the post- 
decoded states fif^^^ and fi^^^^ Bob and Charlie then measure their respective parts 



N 

Elf. 

A^e. 
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of fii ® ^2 in some previously agreed upon orthogonal bases to obtain a simulation 
T of the perfect shared randomness state which, by monotonicity and telescoping, 
satisfies 

|T-T|i < |$®$-fii®fi2|i 
< e'. 

Further, the noisy shared randomness for the classical messages can be shown to 
satisfy 

|dist(X,X) -dist(X,X)|^ = 2Pr{X = X} 

< e\ 

By monotonicity of trace distance and the triangle inequality, using the noisy common 
randomness state T increases the estimate for each block by 2e^. For identical reasons, 
the same increase is incurred by using the noisy common randomness (X, X) . Thus, 
accounting for both sources of noisy common randomness, the estimate ()8.13p is 
changed to 2e, provided that e < |- The noisy common randomness thus increases 
the bound on the error of the A^-blocked protocol to 2y/e, while costing each of Alice 
and Bob a negligible rate overhead of in order to seed the protocol. 

The above protocol can be considered as defining an encoding map S' : 
^i{N+2)n decoding instrument T> : ^(^+2)" ^ B^M^. Thus, the protocol takes 
an {R,Q,n,en) cq entanglement transmission code and constructs an {R' , Q' , n' , e'^,) 
strong subspace transmission code with cq rate pair (i?', Q') = i , j^r- ) , where 

n' = (^2 + -;j=^ and e'„/ = 2y^. Now, if the rates (/?, Q) are achievable cq rates for 
entanglement transmission, there must exist a sequence of (i?, Q, 2e^) entanglement 
transmission codes with e„ — > 0. Since this means that ij^^^ increases to unity, we 
have shown that for any 5 > 0, every rate pair {R — 5,Q — 5) is an achievable cq rate 
pair for strong subspace transmission. Since the capacity regions for each scenario 
are defined as the closure of the achievable rates, this completes the proof. ■ 
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8.4.2 Qet C Q 

Proof. We will employ similar techniques as were used in the previous proof to obtain 
this implication. Suppose there exists a {Qa, Qb, n, |e^) qq entanglement transmission 
code {Si,S2,V), with S^: A ^ A'", £2: B ^ B"\ and P: ^ AB. Setting a = 
\A\ = 2"*^" and b = \B\ = 2"'^'', define the common randomness states 

1 

and 

1 

a;=l 

These states will be used as partial inputs to the controlled unitaries 

x=l 
a;=l 

where, as before, we have utilized the Weyl unitaries {Ux}x£a^ and {Vy}y^b^, which 
respectively completely randomize any states on a-dimensional and 6-dimensional 
spaces. Suppose Alice and Bob are respectively presented with the A and B parts of 
the arbitrary pure states and \'^2)^^- Writing M = P o A/"®" o (£1 ® £2), and 

defining the map T : AB ^ AB hy 



r : T ^ {Uc ®Vc) o M o (Ua® VB){r (g) Ti ® T2), 
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the overall joint state of the randomized protocol is given by T(\l/i®^'2)- Abbreviating 

\xy){xy\^'^ = \x){x\^'^ ® \x){x\^^ ® \y){y\'^'^ ® \y){y\^^ 

and defining 1^'^,.)^^ = (1^ ® Ux)\'^i), \'^y)^^ = (1^ ® Vy)\-^2), we write 

= ^ E 1^2/) (^2/1 %■ 

xy 

By similar arguments as in the cq case, there exists a map O : AB ABQR so that 

(O® l^^)($i®$2) = a. 
Again, for the same reasons as in the cq case, we have 

|(T-id)(*i®*2)|i < |(^-id)((T)|i 

< -id)($i ® $2)|i 

< e. 

The rest of the proof is nearly identical to that from the previous section, so we omit 
these details, so as not to have to repeat our previous arguments here. ■ 



Chapter 9 

Single-letter examples 

Due to the regularized form of our Theorems 1 and 2, the possibihty of actually com- 
puting the capacity regions seems generally out of reach. Here we give some examples 
of channels whose capacity region does in fact admit a single-letter characterization, 
in the sense that no regularization is necessary. In the first section below, we show 
that a certain erasure quantum erasure multiple access channel has an additive cq 
capacity region. The next two sections describe classes of channels which have ad- 
ditive single-user capacities. The contents of these two sections are essentially an 
elaboration of results which appear elsewhere in jTH]. The last section demonstrates 
that the qq capacity region of a certain collective phase-flip channel has an additive 
capacity region. 

9.1 Proof of additivity of CQ for quantum erasure 
multiple access channel 

Our first example is a multiple access erasure channel M \ A'B' C, where \A'\ = 
2,\B'\ = d and \C\ = d + 1. Alice will send classical information while Bob will send 
quantum. Fixing bases {|0)^', . . . \d)^'}, {\0f, . . . , \d)^}, the channel 
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has d + 1 operation elements 

The action of the channel can be interpreted as follows. First, a projective measure- 
ment of Alice's input along {|0), |1)} is performed. If the result is 0, Charlie's output 
is prepared in a pure state |0). Otherwise, Bob's input is transferred perfectly to 
the remaining degrees of freedom in Charlie's output. Bob's input is "erased", or 
otherwise ejected into the environment, whenever Alice sends |0), and is perfectly 
preserved when she sends |1). Indeed, the action of J\f on r"^' ® p^' is given by 

AA(r®p) =roo|0)(0| + rnp. 

We will show that the cq capacity region of this channel, CQ(A/'crasurc), has a 
single-letter characterization given by the collection of pairs of nonnegative classical- 
quantum rates (i?, Q) such that 

R < H{p) 

Q < {l-2p)\ogd 

for some < p < |, constituting a generalization of results in [7j on single-user 
erasure channels to a multiuser setting. Figure 15.11 contains a plot of this region for 
the case where d = 2. 

In the sense of (15. ip . any state Q^^'-^'' which arises from TV®*^ can be specified 
by fixing some pure state ensemble {p{x), \(f>x)^'^} and a pure bipartite state 
We thus write 

n = J2p{x)\x){x\^ ® (1^ ® Ar®^)(0, ® ^). 

X 

For a binary string y'', let \y'')^"° = \yi)^' ■ ■ ■ Ivk)^' be the associated computational 
basis state. Writing p{y''\x) = \{y^\<f)x)\'^ defines the random variable Y'', which is 



|z)Wir {^ 



i = 1, . . .d. 
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correlated with X, and can be interpreted as the erasure pattern associated with the 
state Q. We next define another state of the form (jS.lj) . 



for 

jk 

where the summation is over ci-ary strings of length k, = (ji, . . . ,jk)- Finally, for 

= Pr{y, = 0}, 
1 

1 = 



1^) 



k 

i=l 



define a third state 

CO^^C ^ ^|Q^^Q|C/ ^^B^ |Q^^Q|C ^ _ ^ 
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The above states can easily be seen to satisfy the following chain of inequalities 



I{X;C% = liX-^C": 



I{X; 



< H{Y%, 

k 

i=l 
k 

i=l 

< kH{q) 
= kH{U)^ 

= kI{U-C)^. 

The only nontrivial step above is that we have used the concavity of the binary 
entropy function in the last inequality. Furthermore, it is not hard to see that 



h{B)C^X)^ < h{B)C''XY%, 
= hiB)C''Y%, 
= kh{B)CU)^. 



Thus, we have shown that for any state Vt^^^^ arising from J\f®^ in the sense of 
(|5.1|) . there is a state uj^^^ arising from M in the same sense, allowing the multi- 
letter information quantities to be bounded by single-letter information quantities; 
i.e. CQ(Ar) = CQ«(Ar). ■ 
As it is clear that I{U ; C)^ = H{q), we focus on calculating 

Ic{B)CU)^ = q(^H{\0){0f)-H{7r!®\0){0f)) + {l-q)[H{7r^)-H{^^^)) 
= g(O-logrf) + (1 -g)(logd-0) 
= (l-2g)logrf. 
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Note that the above quantity is a weighted average of a positive and a negative 
coherent information. It is perhaps tempting to interpret these terms as follows. 
The positive term can be considered as resulting from a preservation of quantum 
information, while the negative term can be seen as signifying a complete loss of 
quantum information to the environment. The overall coherent information is positive 
only when q < ^, a. result which is in agreement with the result of Bennett et al. [7] 
on the quantum capacity of a binary erasure channel. Varying < g < ^, the rate 
pairs 

{R,Q) = {l{U;C),h{B)CU))^ 
= {H{q),{l-2q)logd) 

can be seen to parameterize the outer boundary of CQ{M), as is pictured in figure 
15. II for the case d = 2. 

As an aside, we remark that this calculation, together with the quantum channel 
capacity theorem from ^21; gives a direct derivation of the quantum capacity of a 
quantum erasure channel, without relying on the no-cloning and hashing arguments 
used in [7j. 

9.2 Degradable channels 

While for the single-user capacity Q{J^) of an arbitrary quantum channel A/": A' ^ B 
is known not to be additive in general, there is a certain class of channels for which 
additivity follows relatively easily. This is the class of so-called degradable channels 
P!3] . A channel A/" is degradable if its complement A/""^: A' ^ E is a, stochastically 
degraded version of A/", i.e. if there exists a degrading channel M'^ : B ^ E such that 

AT^ = AT'^ o M. 

Below, we will give a version of the proof from of the additivity of the quantum 
capacity of an arbitrary degradable channel. Then, we argue that the maximum sum 
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rate bound of the qq capacity region is additive for such channels. 

Assume that A/i : A[ — > Ci and A/2 : A2 — C2 are degradable, with isometric 
extensions Ui: A'- ^ CiEi. Fix an input state |\l/)^^'i^2 which gives rise to the global 
state = Ui ®U2{^^^'^), where the lAi are isometric extensions of the Mi. By 

degradability, there exist A/^'^'s so that = Aff o Afi, where Aff = Trc. W^. Letting 
Vi : Cj EiFi isometrically extend each Aff, define 9^'^' = Vi ® V2(Tr^ij2 fi). Then 

IM)C% = H{C%-H{E% 

= H{F^E^)e-H{E% 

= H{F^\E^)e 

< H{F,\E{)e + H{F2\E2)e 

= H{FiEi)q — H{Ei)q + H{F2E2)e — H{E2)e 

= H{Ci)n-H{E,)n + H{C2h~H{E2)n 

= H{C,)n - H{AC^E2)n + ^(^2)^ - H{AC^E,)n 

= h{AC2E2)C,)n + Ic{AC,E,)C2)n 

= Ic{A^)Ci)^,+Ic{A2)C2U 

where the inequality is by Lemma|H| In the last line, we set uf^*"^ = A/i(\E'), identifying 
Ai = AA'2 and A2 = AA'^. All other steps are either by the fact that isometrics 
preserve entropy or by other trivial rewritings. 

Now, if we are given k identical channels Af \ A'^ ^ Ci and we fix an input state 
1^)'^^"' giving rise to Ifi)^*-^"-^" = W®^(\E'^'^"°), recursive application of the above 
yields 

IciA)C%<J2lciA,)C,)^^ 

i 

where Ai = AA[ ■ ■ ■ A^_^N^^^ ■ ■ ■ A^, ujf'^' = A/'i(^), and Afi is Af acting on the zth 
tensor factor. Choosing 

i* = argmax{Jc(Aj)Ci)a;J 
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yields 

l;h{A)C% < h{A,, < maxJ,(A)C)^ = Q'^^\U), 

where the maximization is as over all uj = ^{(j)^^'). 

Let us phrase this conclusion using different notation. Let r^"'^'^' be arbitrary, 
and define rf^^^ = Tt/^.b^ t, where Tt f^iBi denotes the partial trace over all systems 
which are not AiBi. Then 

where 

i* = arg max /c(rj, A/"). 

i 

Now, if p"^"° and a^"" are arbitrary, and we define pi = Tr/^-p and = Tr/^-cr, 
observe that if r = p ® o", then Tj = pi ® cTj. This immediately implies that 

Ic{p ® aM®^) < kicipi* ® c^i-, AT), 

where 

i* = arg max /c(pi ® (Tj,7V), 

i 

proving that the maximum sum rate of any degradable channel is additive, even when 
the inputs are restricted to be product states. This fact will be useful in Section 19.41 
where we give a channel whose qq capacity region is single-letter. 

9.3 Generalized dephasing channels 

In this section we describe a certain subclass of the class of degradable channels. 
These are channels M \ A' ^ B with 1^41 = \B\ = for which there is a particular 
orthogonal basis {l^)"^'} which can be transmitted through the channel without error 



J\f{\x){x\) = \x){x 
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although superpositions of these basis vectors are potentially subject to noise. Here, 
is a corresponding orthogonal basis for B. Such a channel has an isometric 
extension U : A' BE given by 



E/ lA' 



where the states \4>x)^ are not necessarily orthogonal. To see that these channels are 
degradable, observe that for any input state 



X \x"x' 



Note that A/''^(p), depends only on the diagonal matrix elements of p (when it is 
expressed in the dephasing basis. However, these are exactly the matrix elements 
which are unaffected by the action of A/", making degradability evident. In fact, the 
degrading channel is precisely A/"^, i.e. 

AT^ = AT^ o AT. 



It is interesting to relate the isometric extension Uj^ to the operator sum representa- 
tion for A/". To do this, first express Uj^ in the flattened representation 



/l 



i) 



\ 



V 



Supposing that \E\ = k, note that the matrix is "block diagonal", with d kxl blocks, 
where this is expressed as a map to the system EB. Regrouping the rows into to k 
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groups of size d we rewrite 



(2|0i) 



{k\(t>i 



\ 



(1|02) 



(2|</>2) 



{m2) 



{k\<Pa)l 



N2 



This is just the flattened representation for the map to the system BE (the order 
of E and B have been reversed). Note that we have identified the \E\ blocks with the 
matrices of the operator sum representation 



e=l 



So we see that the operator sum matrices are all diagonal in the basis and are 
given explicitly as 

X 

Reversing the above steps, it is clear that A/" is a generalized dephasing channel if and 
only if it has an operator sum representation consisting of matrices which commute. 

Let us mention that in the special case where the {4>x} are mutually orthogonal, 
the channel is completely dephasing. We denote this channel as A, and note that it 
corresponds to a channel which performs a pure state measurement in the dephasing 
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basis while ignoring the result. This has the effect of setting all of the off-diagonal 
matrix elements of p equal to zero. A obeys the following equations: 

Af^ = AT'oA 
H{A{p)) > Hip). 

The first is because N''^ only depends on the diagonal components of p, while the 
second is proved in |33]. Observe that the inequality is saturated for diagonal p. 
Because of this, we may write 

Q{^) = max J, (p, AT) 

p 

= max \^H{Af{p)) - H{j\f\p)) } 

= max|i7(Aro A(p)) -//(AT^o A(p))} 

= max|i7(X)-i/(5^p(x)0.) 

9.4 Proof of additivity of Q for collective phase-flip 
channel 

While the description of the capacity region Q in Theorem 2 generally requires taking 
a many- letter limit, we give here an example of a quantum multiple access channel 
Mp'. A'B' — > C for which that description can be single-letterized. The channel Afp 
takes as input two qubits, one from Alice and the other from Bob. With probability 
p, the channel causes each qubit to undergo a phase flip, by rotating each by 180° 
about its z-axis before it is received by the receiver Charlie. The action of jVp on an 
input density operator p"^'^' is described in terms of the operator sum representation 
as 

Mp{p) = (1 - p)p + p{(Tz ® cr^)p{a^ (g) (T^), 
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where 

is the Pauh phase flip matrix. We will demonstrate that Q{Mp) is equal to the 
collection of all pairs of nonnegative rates {Qa-, Qb) which satisfy 

Qa < 1 

Qb < 1 

Qa + Qb < 2-H{p). 

Proof. In order to prove this, we flrst recall that the maximum of the sum rate bound 
Ic{AB)C) over all inputs of the form ()5.2|) is additive. Next, we calculate Q{J\fp), 
the single-user capacity of the channel, and observe that it is achieved for inputs 
of the form ()5.2|) . implying that the maximum sum rate bound equals the capacity. 
Then, we show that for the same inputs, the bounds Ic{A )BC) and Ic{B )AC) on the 
individual rates are as large as is possible. The characterization in terms of a single 
pentagon will then follow. 

We flrst note that the the operator sum matrices ® Ox and a/1 — pl4 com- 

mute. By results in the previous section, we conclude that Mp is an example of a 
generalized dephasing channel and thus, the following two conditions are satisfled: 

• for any state Q^^'-^'^ = j\f'^k^^ABA"'B"'^ arising from A/"*^^ (where Alice and Bob 
can jointly prepare any state at the inputs), there is a state cu^^^ = J\f(ip^^^'^') 
for which 

lMB)C)^>jIc{AB)C%. 
k 

Furthermore, the input density operator p"^'^' = Tiab tjj^^^'^' is diagonal in 
the dephasing basis of Afp. 

• for any state fi'^^'^'^ = J\f®''(\ijf'^"' \[f^^'*) arising from M'^'' in the sense of 
()5.2j) . there is a state cu'^^'" = ® 0f ■^') arising from J\f in the same sense 
for which 

lMB)C)^'>jIc{AB)C%. 
k 
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The first condition above says that the single- user capacity Q{J^p) is additive. It 
also guarantees that the relevant maximization is achieved by an input density oper- 
ator p^'^' which is diagonal in the dephasing basis. The second condition guarantees 
that the constrained single-user capacity of A/^, when the users are constrained to 
preparing product input states, is additive. 

In order to compute Q{J^p), let us first write an isometric extension U : AB ^ CE 
of Np as 

= \^Jf\4>^,)^ 

where 
and 

= I0io)^ = V^io)^ - VpIi)^ = 10-)^. 

A complementary channel is then defined as 

= (POO + Pll)0+ + (POl + Pio)0-- 

Observe that the output of the Af^ depends only on the diagonal elements of p, when 
p is written in the dephasing basis {|00), |01), |10), |11)}. Define a = poo + Pii- As 
Q^Mp) is achieved when p is diagonal in this basis, let us calculate 

H{C) = H{A'B') 

= ^({Poo,Poi,Pio,Pii}) 

\ a / \1 — a/ 

< H{a) + 1, 
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where the inequahty is saturated when poo = Pii = f and poi = Pio = It thus 
suffices to optimize over the class of states 



2 








o\ 





2 














2 
















for which 

H{C)=H{p) = 1 + H{a), 

Note that we may express 

0f = ^ (l ± ^p{l-p)a, - (1 - 2p)a,) , 

allowing us to write 

Ar;(p) = «0+ + (1 - = i(l + (2« - l)v/p(l-p)^x - (1 - 2p)a,), 

so that H{E) = i/(^i(l + - p){2a - ly + (1 - 2^)2)^. Thus, 

= 1 + H{a) - ^(^(1 + - P)(2« - 1)2 + (1 - 2p)2)) 

For fixed is symmetric about a = |, and has a ffist derivative which is positive 

for < a < I (and is thus negative for | < a < 1). Because h{a) is continuous on 
< a < 1, its maximum is attained when a = |, so that 

max/,(p, AT) = /,(7r^'^', AT) = 1 + i/Q - = 2 - 
So we see that the maximum is already achieved for a product state vr^'^' = vr^' ®7r^'. 
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Define the Bell states 



\i^±) = ^(|oo)±|ii)). 



As purifies the maximally mixed state 7T2, let us define the global state 
Identifying C = AB in the obvious way, let us reexpress 

^AABB p^^AA ^ ^BB ^ ^^AA ^ ^BB _ 

It is now a simple task to calculate 

H{ABC) = H{uj) = H{p) 
H{C) = H{7r^') = 2 
H{AC) = H{AA) + H{B) = H{p) + 1 = H{BC). 

Combining these gives the relevant coherent informations 

IciAB )C) = H{C) - H{ABC) = 2 - H{p) 

Ic{A )BC) = H{BC) - H{ABC) = 1 + H{p) - H{p) = 1 

h{B)AC) = H{AC) - H{ABC) = 1. 

As we saw in Section EH I^{A)BC) < log\A'\ = 1 and Ic{B)AC) < log\B'\ = 1 
for any state arising from A/". The individual rate bounds are thus saturated and the 
claim follows. ■ 
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Discussion 



There have been a number of resuhs analyzing multiterminal coding problems in 
quantum Shannon theory. For an i.i.d. classical-quantum source XB, Devetak and 
Winter J3] have proved a Slepian- Wolf-like coding theorem achieving the cq rate 
pair {H{X\B), H{B)) for classical data compression with quantum side information. 
Such codes extract classical side information from 5" to aid in compressing X". 
The extraction of side information is done in such a way as to cause a negligible 
disturbance to 5". Our Theorem^ is somewhat of this flavor. There, the quantum 
state of C" is measured to extract Alice's classical message which, in turn, is used as 
side information for decoding Bob's quantum information. Analogous results to ours 
were obtained by Winter in his analysis of a multiple access channel with classical 
inputs and a quantum output, whereby the classical decoded message of one sender 
can be used as side information to increase the classical capacity of another sender. 

We further mention the obvious connection between our coding theorems and 
the subject of channel codes with side information available to the receiver. The 
more difficult problem of classical and quantum capacities when side information 
is available at the encoder is analyzed by Devetak and Yard in ^7], constituting 
quantum generalizations of results obtained by Gelfand and Pinsker |25 for classical 
channels with side information. 

In an earlier draft of [SH], we characterized Q{M) as the closure of a regularized 



134 



CHAPTER 10. DISCUSSION 



135 



union of rectangles 

< 5 < 

This solution had been conjectured on the basis of a duality between classical Slepian- 
Wolf distributed source coding and classical multiple-access channels [TTl[Tn], as well 
as on a purported no-go theorem for distributed data compression of so-called ir- 
reducible pure state ensembles that appeared in an early version of [3]. After the 
earlier preprint was made available, Andreas Winter announced jHO] recent progress 
with Jonathan Oppenheim and Michal Horodecki on the quantum Slepian-Wolf 
problem, offering a characterization identical in functional form to the classical one, 
while also supplying an interpretation of negative rates and apparently evading the 
no-go theorem. Motivated by the earlier mentioned duality, he informed us that the 
qq capacity region could also be characterized in direct analogy to the classical case. 
Subsequently, we found that we could modify our previous coding theorem to achieve 
the new region, provided that the rates are nonnegative. After those events unfolded, 
the authors of jH] found an error in the proof of their no-go theorem, leading to a 
revised version consistent with the newer developments. Our earlier characterization 
of Q(A/'), while correct, is contained in the rate region of Theorem 2 for any finite k, 
frequently strictly so. The newer theorem, therefore, gives a more accurate approxi- 
mation to the rate region for finite k. In fact, for any state arising from the channel 
which does not saturate the strong subadditivity inequality the corresponding 
pentagon and rectangle regions are distinct. As seen in Section another beneficial 
feature of the new characterization is that for any channel which is degradable, the 
maximum sum rate bound R + S < max Ic{AB )C) is additive, where the maximiza- 
tion is over all states of the form ()5.2j) . Furthermore, recall that in Section f9.4l the 
pentagon characterization was single-letterized for the collective phase flip channel. 
On the other hand, computer calculations have revealed that the rectangle region 
does not lead to a single-letter characterization of that channel. This seems to indi- 
cate that the newer characterization is the "correct" one, at least for that particular 
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channel. 

More recently, we discovered that the same technique used to prove the new 
characterization of Q(A/') implies a new cq coding theorem, and thus a new charac- 
terization of CQ{N'). By techniques nearly identical to those employed in the coding 
theorem for Theorem 2, it is possible to achieve the cq rate pair 

iR,Q) = {liX;BC),UB)C)) 

corresponding to Bob's quantum information being used as side information for de- 
coding Alice's classical message. This is accomplished by having Charlie isometrically 
decode Bob's quantum information, then coherently decode to produce an effective 
channel A/i : A' — * BC so that Alice can transmit classically at a higher rate. The 
new characterization is then a regularized union of pentagons, consisting of pairs of 
nonnegative rates (i?, Q) satisfying 

r < I{X;BC) 
S < Ic{B)CX) 
r + S < I{X;C) + IciB)CX) = I{X-BC) + Ic{B)C). 

Surprisingly, it is thus possible to characterize each of CQ{Af) and Q{J\f) in terms 
of pentagons, in analogy to the original classical result. This situation makes appar- 
ent the dangers of being satisfied with regularized expressions for capacity regions. 
Without being able to prove single-letterization steps in the converses, it is hard to 
differentiate which characterization is the "right" one. While it is intuitively satisfy- 
ing to see analogous formulae appear in both the classical and quantum theories, the 
regularized nature of the quantum results blurs the similarity. Indeed, the problems 
with single-letterization for single-user channels appear to be amplified when analyz- 
ing quantum networks (see e.g. jUj). While Q is additive for the collective phase flip 
channel of Section 19.41 this behavior does not appear to be generic for the classes of 
degradable or generalized dephasing channels, as the saturation of the individual rate 
bounds for that example seem to be the source of additivity. Perhaps this indicates 
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that the necessity of understanding the capacities of single-user channels at a level 
beyond regularized optimizations is even more pressing than previously thought. It 
should be mentioned that for the erasure channel analyzed in Section Wl\ the newer 
description of CQ{M) is not an issue, as the new corner point is contained in the old 
rectangle for any state arising from any number of parallel instances of the erasure 
channel. 

Consider the full simultaneous classical-quantum region S{M) defined in ^ec- 
tion 15.21 This region can be characterized in a way that generalizes Theorems 1 and 
2 as the regularization of the region S^'^\N')., defined as the vectors of nonnegative 
rates (i?^, R^, Qa, Qb) satisfying 



Ra 



Ra 


< 


I{X;C\Y) 


Rb 


< 


I{Y;C\X) 


Rb 


< 


I{XY; C) 


Qa 


< 


h{A)BCXY) 


Qb 


< 


h{B)ACXY) 


Qb 


< 


h{AB)CXY) 



Qa 

for some state of the form 



y 



arising from the action of A/" on the A' and B' parts of some pure state ensembles 
{p{x), l^px)"^^'}, {p{y), {(f'y)^^'}- Briefly, achievability of this region is obtained as 
follows. Using techniques introduced in each sender "shapes" their quantum 
information into HSW codewords. Decoding is accomplished by first decoding all 
of the classical information, then using that information as side information for a 
quantum decoder. A formal proof of the achievability of this region is found in 
|52j . The main result of ^^1, the regularized optimization of the cq result from |^ 
over pairs of input ensembles, and our Theorems 1 and 2 follow as corollaries of 
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the corresponding capacity theorem. Indeed, the six two-dimensional "shadows" of 
the above region, obtained by setting pairs of rates equal to zero, reproduce those 
aforementioned results. This characterization, however, only utilizes the rectangle 
description of CQ(A/'). It is indeed possible to write a more accurate regularized 
description of 5(A/') which generalizes the pentagon characterizations of CQ(A/') and 
Q{M), although we will not pursue that at this time. 



Chapter 11 
Appendix 



11.1 Quantum instruments and coherent informa- 
tion 

For some finite set S, consider a labelled collection of channels {A/'sjsg^, where 
N's : A' ^ B. Define an instrument A/": A' SB to act as 

A/": r ^^|)(s)|s)(s|^®M(r). 

s 

An instrument channel such as M may be interpreted as one with classical state 
information made available to the receiver. We will show that every channel M'^ : A' 
E which is complementary to J\f is an instrument as well, as the environment E 
contains a copy of S. In other words, the classical state information is also available 
to an eavesdropper with full control of the environment. 

An isometric extension U of J\f may be constructed as follows. First, fix isometric 
extensions Us'- A' ^ E'B for the individual A/'s's. Then, define U: A' ^ SEB via 

U = Y,^/p(s)\sf\sf"^Us, 

s 

taking E = E'E". That this is indeed an isometry is evident, because = 
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'^sPis)UjUs = J2sP(^)^^' ~ ■ further check that U is in fact an ex- 

tension of Af, by calculating 

= TTE'J2p{s)\s){sf ®U,{t) 

s 

= Ar). 

Thus, the action of the complementary channel N''^ can be defined via U as 

= Ti bY,p{s)\s) {sf ®Us{t) 

s 

s 

where the A^'^ = Tr^W^ are complementary channels to the AC^'s. 

11.2 Proof of convexity of CQ and Q 

Let M : A' B' ^ C be a quantum multiple access channel. We will prove that Q{N') is 
convex, as the proof for CQ is identical. Let and ki be positive integers, and fix any 
two states of the form Q, and Then (i?o, 5o), Si) G Q{U), 

where for i G {0, 1}, 

We will now show that for any rational < A < 1, A(i?o, 5'o) + (l-A)(i?i, ^i) G Q{N'). 
We first write A = ^, for integers satisfying /5>0,/3>a>0. Setting po = a^i, 
Pi = {P — C()ko, and k = poko + piki, define the composite systems A = Aq°A^^ and 
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B = Bq°B^^ , as well as the density matrix a^^'"'' = a^^" ® crf^^, which is also of the 
form (j5.2p . Additivity of coherent information across product states and some simple 
algebra gives 

llM)C\ = P^^Ao)C'%, + ^lMi)C'')^. 
_ pokpRp + pikiRi 

Poh+Pih 
= XRo + {l-X)Ri. 

An identical calculation shows that ^Ic{B)C^)a = XSq + (1 — A)^!. As Q{Af) was 
defined as the topological closure of rate pairs corresponding to states which ap- 
propriately arise from the channel, the result follows because the set of previously 
considered A's comprises a dense subset of the unit interval. 



11.3 Proof of cardinality bound on X. 

Begin by fixing a finite set A", a labelled collection of pure states {\(f)x)^'}xGX, and a 
pure bipartite state For each x, these define the states o"^'" = M{(f)x ® and 

uj^ = TtbCTx- Assume for now that \A'\ > \C\. Define a mapping f : X ^ 
via 

f:x^fx = {uJx,H{uJx),Ic{B)C)a,), 

where we are considering Ux to be synonymous with its |Cp — 1 dimensional parame- 
terization. By linearity, this extends to a map from probability mass functions on X 
to MI^I'+^ where 

/: P{x) ^ = {iJ„H{C\X)„h{B)CX)p), 

X 

Our use of the subscript p should be clear from the context. The use of Caratheodory's 
theorem for bounding the support sizes of auxiliary random variables in information 
theory (see is well-known. Perhaps less familiar is the observation j^UE] that 
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a better bound can often be obtained by use of a related theorem by Fenchel and 
Eggleston [20], which states that if S* C M" is the union of at most n connected 
subsets, and if y is contained in the convex hull of S, then y is also contained in the 
convex hull of at most n points in S. As the map / is linear, it maps the simplex of 
distributions on X into a single connected subset of RI'^I^^^. Thus, for any distribution 
p{x), there is another distribution p'{x) which puts positive probability on at most 
|Cp + 1 states, while satisfying f{p) = f{p'). If it is instead the case that \A'\ < \C\, 
this bound can be reduced to + 1 by replacing the first components of the map 
/ with a parameterization of 0^', as specification of a density matrix on A' is enough 
to completely describe the resulting state on C. It is therefore sufficient to consider 
|X| < min{|A'|, \C\}^ + 1 when computing CQ^^\Af). 
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